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The  main  object  of  this  research  has  been  to  generate  digital  terrain  elevation  data 

from  a  stereopair  of  digitized  aerial  photographs.  To  achieve  this,  a  systematic  approach  was 

developed.  This  approach  involves  1)  preliminary  preparation  of  the  digitized  photographs 

using  pattern  recognition  techniques,  in  particular,  with  the  Fourier  transform;  2)  performing 

multiresolutional,  feature-based  stereo  matching  on  the  stereo  pair.  Photogrammetric 

techniques  were  then  used  to  calculate  the  terrain  elevation  data  using  coordinates  derived 

from  the  stereo  pair. 

Pattern  recognition  by  employing  the  Fourier  transform  was  successfiil  in  locating 
the  positions  of  fiducial  marks  in  the  aerial  photos.  Multiresolutional,  feature-based  stereo 

XV 


matching  was  implemented  by  applying  geometrical  and  similarity  constraints.  Laplacian 
and  Gaussian  filters  with  different  sizes  and  Mallat's  wavelet  transform  were  used  to 
decompose  the  original  images  into  a  series  of  images  with  different  scales.  The  terrain 
elevation  data  collected  from  the  developed  system  with  different  multiresolution  approaches 
was  subsequently  compared  with  data  collected  by  manually  operating  the  analytical  stereo 
plotter,  Kern  DSR-14.  The  results  show  that  the  elevation  data  collected  by  the  developed 
approach  is  very  similar  to  the  data  collected  by  manually  operating  the  stereo  plotter. 

Compared  with  the  digital  terrain  model  generated  by  manually  operating  the  stereo 
plotter  and  the  digital  terrain  models  generated  by  the  developed  approach,  it  was  foimd  out 
that  the  developed  approach  can  successfully  generate  a  digital  terrain  model  from  a  given 
stereo  pair.  The  C  factor  of  the  developed  system  is  around  2000  for  the  test  images,  and  is 
very  close  to  the  C  factors  of  analytical  stereoplotters.  During  experiments,  it  was 
discovered  that  when  an  area  is  covered  with  many  trees  and  bushes,  it  is  very  difficult  to 
locate  the  stereo  correspondences  from  the  given  stereo  pair.  The  developed  system  still  has 
difficulty  to  solve  this  problem.  Two  image  pyramids,  Laplacian  and  Gaussian  filter  with 
different  sizes  and  Mallat's  wavelet,  were  used  to  create  a  series  of  images  with  different 
scales.  Then,  the  stereo  matching  system  developed  in  this  research  was  applied  to  generate 
digital  terrain  model  from  different  image  pyramids.  It  was  found  out  that  the  digital  terrain 
models  generated  by  applying  different  image  pyramids  were  very  similar.  However, 
Mallat's  image  pyramid  has  faster  processing  speed  than  that  of  Laplacian  and  Gaussian 
image  pyramid. 
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CHAPTER  1 
INTRODUCTION 


1  ■  1  General  Description 

The  goal  of  this  research  is  to  extract  terrain  elevation  data  from  a  digitalized 
stereopair  of  a  specified  surface  of  earth  by  applying  a  digital  photogrammetric  approach. 
Digital  photogrammetry  is  concerned  with  a  computer-aided  method  to  implement  the 
stereoscopic  interpretation  of  imagery.  The  most  crucial  and  difficult  step  for  digital 
photogrammetry  is  to  locate  points  from  one  digital  image  of  a  stereopair  with  respect  to  the 
corresponding  points  on  another.  After  these  points  are  located,  a  series  of  procedures, 
including"^  aerial  triangulation,  can  be  used  to  derive  the  3-D  coordinates  of  ground  objects 
that  show  on  the  stereopair. 

Digital  photogrammetry  is  an  efficient  way  to  collect  data  from  digitized  photos  to 
create  a  digital  terrain  model  of  the  specified  area.  As  known,  the  collected  data  from  a 
human  operator  always  have  a  problem  in  terms  of  reliability  and  consistency.  Even  for 
experienced  operators,  if  they  cannot  pay  their  full  attention  to  data  capture,  the  collected 
results  still  have  the  same  reliability  and  consistency  problem  as  that  of  an  inexperienced 
operator.  Current  optical-analog  instruments,  such  as  analytic  stereoplotters,  can  provide 
a  very  precise  and  accurate  way  for  photogrammetric  data  capture.  In  fact,  this  kind  of  data 
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capture  relies  heavily  on  a  human  operator's  interpretative  skill  and  experience  and  involves 
a  great  deal  of  time  for  labor.  With  the  advances  of  computer  technology,  it  is  possible  to 
collect  data  systematically  by  applying  computer  algorithms  to  significantly  reduce  an 
operator's  work  load  and  improve  reliability  and  consistency. 

1 .2  Statement  of  Problem 

Digital  terrain  modeling  is  a  term  used  to  describe  the  process  of  representing  the 
physical  surface  of  the  earth  by  means  of  a  series  of  data  points  captured  from  techniques 
such  as  traditional  ground  surveying,  digitizing  existing  topographic  maps,  and 
photogrammetry.  Photogrammetry  is  known  as  the  most  efficient  approach  to  collect  data 
with  high  accuracy  for  a  digital  terrain  model  (DTM).  Traditional  photogrammetric  data 
capture  is  based  on  operating  a  stereo  plotter. 

By  operating  a  stereo  plotter,  a  person  can  quickly  find  a  particular  point  from  a  aerial 
photo,  and  rapidly  locate  stereo  correspondences  from  a  stereopair  with  little  effort,  but  a 
computer  needs  a  lot  of  time,  enormous  amount  of  computation,  and  sophisticated  algorithms 
to  accomplish  the  same  process.  On  the  other  hand,  a  human  being  cannot  concentrate  on 
one  thing  for  too  long  because  of  his  or  her  physical  and  mental  limits,  while  a  computer  can 
maintain  its  performance  indefinitely. 

Fundamentally,  two  kinds  of  image  matching  difficulties  need  to  be  solved  by  the 
digital  photogrammetric  technique.  First  is  to  find  the  particular  points  (to  identify  the 
positions  of  an  object)  and  to  locate  the  stereo  correspondences  in  a  digitized  stereo  pair. 


The  second  is  to  find  the  position  of  a  particular  object  (  like  a  fiducial  mark  shown  on 
imagery)  and  some  criteria  must  be  established  when  using  a  computer  in  place  of  a  human 
operator.  Although  different  criteria  have  been  established,  to  find  a  fiducial  mark  in  digital 
imagery  is  still  a  time-consuming  job,  and  requires  excessive  computation.  Sometimes,  it 
occurs  that  the  particular  point  cannot  be  found  or  the  position  for  the  found  point  is  not 
correct. 

To  locate  the  stereo  correspondences  in  a  given  stereo  pair  is  difficult.  A  point  shown 
on  the  left  image  of  the  given  stereopak  and  its  correspondence  shovra  on  the  right  image  of 
the  given  stereopair  can  be  identified  by  a  human  operator  by  applying  some  characteristics 
of  the  point.  For  example,  the  relative  position  of  the  point  to  the  whole  image,  the 
V  brightness,  and  the  tangent  direction  of  an  edge  in  the  area  of  the  particular  point  can  be  used. 
Traditionally,  two  kinds  of  approaches  have  been  widely  applied  by  photogrammetrists  to 
locate  stereo  correspondences.  One  is  an  area-based,  and  another  one  is  a  feature-based 
matching  method.  The  area-based  matching  method,  comparing  the  neighborhoods  of  the 
particular  point  shown  on  different  images,  needs  extensive  computation,  but  it  can  reach  a 
very  high  degree  of  accuracy.  Feature-based  matching  method,  compares  the  distinctive 
features  ( like  edges )  between  different  images  to  locate  the  stereo  correspondences.  It  is 
less  accurate  but,  because  the  method  only  processes  the  distinctive  features  in  imagery 
instead  of  all  the  gray  values  of  the  whole  image,  processing  is  fast.  How  to  establish  a 
schematic  approach  to  provide  sufficient  accuracy  with  reasonable  processing  time  is  still 
a  research  topic  in  photogrammetry. 


1 .3  Scope  of  the  Research 
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This  research  focuses  on  estabUshing  a  schematic  approach  to  collecting 
photogrammetric  data  for  a  DTM  using  computer  algorithms  to  reduce  the  operator's  work 
load.  The  procedure  used  by  photogrammetrists  to  produce  a  DTM  may  be  divided  into  the 
following: 

Step  1  )  Image  acquisition  and  scanner  calibration. 

Step  2  )  Measuring  the  image  coordinates  of  fiducial  marks  and  photo  control  points. 

Step  3  )  Orienting  the  given  digital  stereo  pair. 

Step  4 )  Stereo  Matching. 

Step  5  )  Generating  the  digital  terrain  model. 

In  general,  image  acquisition,  digitizing  an  aerial  photograph,  and  calibrating  the 
digitized  image  are  necessary  to  be  performed  before  further  processing.  The  aerial  photos 
used  in  this  research  cover  the  University  of  Florida  campus.  For  this  research,  it  was 
assumed  that  the  scanner  distortion  was  negligible. 

In  fact,  there  are  two  major  goals  which  need  to  be  accomplished  for  collecting 
terrain  elevation  data  in  photogrammetry.  One  is  to  automatically  locate  the  position  of  an 
interest  point  shown  on  a  digitized  image,  and  another  one  is  to  implement  the  stereo 
matching  in  a  pair  of  images  as  human  eyes  do  with  stereo  photos.  Traditionally,  an 
operator  through  the  use  of  a  pointing  device,  picks  the  positions  of  interest  points  shown  on 
a  digitized  photo  when  the  photo  is  shown  on  a  computer  display.  In  this  research,  a  method 
of  comparing  the  textures  between  a  predefined  template  and  a  scene  containing  the  template 


by  using  Fourier  Transform  is  employed  to  automatically  indicate  the  position  of  the 
template  shown  in  the  scene.  For  example,  during  measurement  of  the  positions  of  the 
fiducial  marks  in  the  imagery,  the  method  was  attempted  to  locate  those  positions  on  the 
given  image.  From  experiment  results,  the  fiducial  marks  in  the  imagery  can  be  efficiently 
and  accurately  located.  Furthermore,  if  the  photo  control  points  are  covered  with  an  artificial 
marks,  the  method  can  correctly  identify  their  positions  also.  Unfortunately  the  photo 
control  points  used  in  this  research  are  natural  features  with  coordinates  derived  from  aero- 
triangulation,  so  some  manual  operations  were  needed  in  this  research  to  locate  their 
positions. 

As  for  stereo  matching,  its  basic  function  is  to  find  the  image  of  a  particular  feature 
on  one  photo  and  the  corresponding  image  on  the  other  photo.  Fundamentally,  stereo 
matching  is  to  simulate  the  behavior  of  human  eyes  when  a  person  views  on  a  stereo  pair, 
and  then  a  3D  scene  is  shown  from  the  given  stereo  pair.  Roughly,  stereo  matching  can  be 
implemented  by  comparing  the  difference  of  different  digital  number  (  DN  )  distributions 
presented  in  the  given  stereo  pair  ( called  area-based  matching ),  or  comparing  the  similarity 
between  features  shown  on  the  given  stereo  pair  (  called  feature-based  matching).  Both 
methods  have  been  widely  used  to  implement  stereo  matching  in  photogrammetry.  In  this 
research,  feature-based  approach  was  employed  to  implement  stereo  matching  procedure 
because  the  method  can  efficiently  reduce  the  computational  burden.  If  suitable  constraint 
conditions  are  added,  the  matched  results  can  reach  a  degree  of  high  accuracy.  In  addition, 
epipolar  geometry  is  applied  to  reduce  the  searching  time  during  the  matching  procedure. 
*  .        In  this  research,  for  stereo  matching,  a  method  based  on  comparing  the  features  at 
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different  resolutions  was  developed,  and  the  modified  epipolar  geometry  was  employed  not 
only  to  reduce  searching  ranges  but  also  provide  a  stronger  geometry  to  achieve  a  better 
matching  result  than  with  traditional  approaches.  Then,  the  DN  distributions  from  the 
matched  results  were  compared  to  make  sure  of  the  correctness  of  the  matched  results.  The 
multiresolution  approach  was  used  to  decompose  the  original  image  into  a  series  of  images 
with  different  resolutions.  This  series  of  images  with  different  resolutions  is  called  an  image 
pyramid.  In  this  research,  the  image  pyramids  created  by  applying  a  Laplacian  and  Gaussian 
technique  and  Mallat's  wavelet  approach  were  studied. 

Traditionally,  Laplacian  and  Gaussian  filters  with  different  sizes  have  been  widely 
employed  to  create  an  unage  pyramid.  As  for  Mallat's  wavelet,  some  unportant  information, 
which  is  eliminated  if  Laplacian  and  Gaussian  filter  is  applied,  is  kept  after  the  wavelet 
transform  has  been  completed.  This  information,  for  example,  is  very  usefiil  to  restore  an 
image  from  its  compressed  image  produced  by  using  a  wavelet  transform.  This  research 
studied  the  feasibility  of  applying  the  information  to  generate  a  DTM  from  a  given  stereo 
pair,  and  compared  the  difference  between  the  DTMs  created  by  employing  different  image 
pyramids. 

The  organization  of  this  dissertation  is  described  as  follows:  Chapter  2  provides  a 
backgroimd  in  digital  matching  techniques  and  photogrammetric  relationships;  Chapter  3 
introduces  the  experiment  materials  and  equipment;  in  Chapter  4  the  schematic  approach  for 
collecting  photogrammetric  data  is  proposed.  The  results  of  this  research  are  reported  in 
Chapter  5.  Chapter  6  discusses  and  evaluates  the  difference  between  different  DTMs 
generated  by  employing  different  approaches.  Finally,  conclusions  and  recommendations 
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for  future  study  are  made  in  Chapter  7. 


■I 


■1 


CHAPTER  2 
1  v:  BACKGROUND 

2.1  Introduction 

With  the  advances  of  computer  technology,  there  is  a  tendency  to  collect  data 
schematically  by  applying  computer  interpretations  instead  of  those  of  a  human  operator. 
Photogrammetrists  have  proposed  many  theories  and  experiments  attempting  to  equal  the 
accuracy  of  analj^ical  photogrammetry  by  computational  methods.  Some  of  these  theories 
and  experiments  ( including  computer  vision  and  photogrammetric  techniques )  are  roughly 
described  in  this  chapter. 

Within  this  chapter,  Section  2.2  introduces  the  basic  principles  and  methods  used  in 
area-based  matching;  Section  2.3  presents  the  fundamental  theories  and  method  employed 
in  feature-based  matching;  Section  2.4  shows  photogrammetric  techniques  applied  to 
generate  the  digital  terrain  model. 

2.2  Area-Based  Matching 

Area-based  matching  is  based  upon  comparing  the  differences  between  the  DNs  of 
a  particular  area  in  a  digitized  stereopair  to  decide  whether  the  points  on  one  image  and  their 
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corresponding  points  on  another  image  represent  the  same  groimd  objects  or  not. 
Fundamentally,  the  Fourier  correlation''*''^],  the  normalizing  cross  correlatiori'^''^^!'^''  ,  the 
iterative  least-squares  method'^''^"''^'''''^'''*'',  and  a  miscellany  of  other  methods''^'''^'  have  been 
widely  adapted  by  photogrammetrists  in  their  research. 

As  illustrated  in  Figure  2.1,  at  the  beginning  a  candidate  area  in  the  left  image  and 
a  search  window  in  the  right  image  are  specified.  Then,  search  areas  within  the  search 
window  are  examined  searching  for  a  gray-value  distribution  most  similar  to  that  of  the 
candidate  area.  The  basic  assumption  of  this  method  is  that  a  substantial  number  of  scenes 
shown  in  the  imagery  represent  the  continuous  surfaces  of  some  particular  area;  therefore, 
adjacent  pixels  in  the  imagery  represent  a  series  of  continuous  points  in  object  space '^l 
2.2-1  Fourier  Correlation 

The  Fourier  transform  is  a  very  popular  tool  for  analyzing  signals,  and  it  provides  a 
tool  to  study  the  property  of  digital  signals  in  the  frequency  domain.  In  fact,  dealing  with 
the  problem  of  digital  image  processing  is  equal  to  dealing  with  processing  two-dimensional 
digital  signals.  The  Fourier  transform,  F(u,v),  of  a  candidate  area,  f(x,y),  can  be 
expressed  as  shown  in  Equation  2-1 : 

M-l  M-1  -27Iux/m  -ZJIvyy/M 

F(u,v)  =  -L  5:  5:f(x,y)e      ^      e      ^  (2-1) 

M  ^    x=0  y=0 

where  the  M  is  the  size  of  the  selected  candidate  area. 
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Similarly,  the  search  area,  g(x,y),  is  computed  in  the  same  manner.  Then,  the 
correlation  function  i|f(k,l)is  calculated  by  Equation.  2-2 

M-l  M-1  27lkuv/~T  27lvlvrr 

t(k,l)  =  E  E  [F(u.v)G'(u,v)]e     ^     e     ^  (2-2) 

u=0  v=0 

where  *  indicates  the  complex  conjugate  and  its  value  is  called  the  correlation  coefficient. 

Figure  2.2  shows  a  digital  image  of  a  fiducial  mark  from  the  given  aerial  photos,  and 
Figure  2.3  shows  the  magnitude  information  of  the  Fourier  transform  of  the  fiducial  mark 
displayed  as  an  intensity  image.  Both  images  are  scaled  linearly  for  display  on  an  8-bit 
system;  namely,  the  DN  range  of  each  pixel  in  the  digital  image  is  between  0  and  255. 

The  correlation  fiinction  represents  the  distribution  similarity  of  DNs  in  two  images. 
In  general,  the  value  of  ilf(k,l)  is  between  1  and  -1.  When  the  value  of  i|;(k,I)  is  very 
close  to  1  or  -1,  the  DN  distribution  between  the  candidate  and  search  window  is  almost  the 
same.  By  comparing  the  value  of  correlation  function  between  the  candidate  and  search  area, 
the  position,  whose  correlation  coefficient  is  the  maximum  value  in  the  search  window  and 
larger  than  the  established  threshold,  is  the  place  where  the  match  is  declared. 
2.2-2  Normalized  Cross  Correlation 

In  linear  statistics,  the  coefficient  of  measuring  the  strength  of  the  relationship 
between  two  distributions  is  called  the  coefficient  of  linear  correlation,  or,  simply,  the 
correlation  coefficient.  Photogrammetrists  have  widely  used  this  method  in  stereoscopic 
interpretation  of  imagery.  The  correlation  coefficient  can  be  calculated  as  shown  in  Equation 
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where 

SxY  =  E(X  -  X)(Y 
Sxx  =  E(X  -  X)'  = 
Syy  =  E(Y  -  Y)^  - 
X  :  DN  in  the  candidate  area 
Y  :  DN  in  the  search  area 
X :  mean  DN  in  the  candidate  area 
Y :  mean  DN  in  the  search  area 
n  :  the  number  of  pixel  in  the  candidate  or  search  area 

In  general,  the  value  of  the  coefficient  is  between  1  and  -1 .  Where  the  |  r  |  is  maximal 
and  larger  than  the  established  threshold  in  search  window,  the  matching  job  has  been 
accomplished;  namely,  the  candidate  and  search  area  have  very  similar  distributions  of  DNs. 
Otherwise,  the  search  area  passes  through  in  the  search  window  until  the  maximum  value  of 
I  r  I  is  met,  compared  to  the  other  values  of  |  r  |  in  the  search  window.  If  the  value  of  |  r  |  is 
smaller  than  the  established  threshold,  the  matching  job  fails. 
2.2-3  The  Iterative  Least-Squares  Method 

The  iterative  least-squares  method  has  been  widely  applied  in  the  adjustment  of  the 
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measured  data  in  surveying,  and  is  the  technique  which  is  familiar  to  photogrammetrists. 
Photogrammetrists  have  widely  introduced  the  least-squares  method  into  stereo  matching. 
Consequently,  many  papers  applying  different  least-squares  theories  have  been  published  in 
photogrammetric  and  computer  society  proceedings'^l''*^'^'^'''^'t'"'.  No  matter  how  complicated 
the  algorithms  that  have  been  proposed,  the  basic  theory  is  the  same  and  is  introduced  as 
follows. 

First,  an  assumption  is  made  that  there  is  a  first  order  polynomial  relationship 
existing  between  the  given  stereo  pair;  that  is,  if  the  DNs  of  a  candidate  area  and  a  search 
area  in  a  given  stereopair  represent  the  same  ground  objects,  then  the  relationship  between 
these  two  sets  of  DNs  can  be  described  by  a  polynomial  function.  This  polynomial 
relationship  between  the  given  stereo  pair  is  represented  by  Equation  2-4  through  Equation 
2-6. 

^(^L'VO  =  K  +  ^RC'^r-Xr)  (2-4) 
H  =  "o  ^  ^i^R„  +  «2yR.  (2-5) 

yR  =  *o  +  ^I'^R.  +  ^yR„  (2-6) 

:  the  candidate  area  in  the  left  image 
:  the  search  area  in  the  right  image 
:  the  calibration  factors  for  the  radiometric  shift  and  scale 


where 

L(xL,yL) 
R(x^,y^) 
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Xp ,  Yp  :  the  initial  approximation  of  the  pixel  coordinates  in  the  right  image 

,  :  the  transformed  pixel  coordinates  of  search  area 

,  flj ,  :  the  coefficients  of  a  2D  affme  transform  in  x  direction 

b^,b^,b^        :  the  coefficients  of  a  2D  affme  transform  in  y  direction 
The  linearized  form  of  Equation  2-4  is  shown  as  follows. 

LCXl.Yl)  "  R(Xp,yp)  + 

R  da.  +  x^R  da,  +  y„R  da, 

R  db.  +  x^R  db,  +  y^R  db.  ^  '  ^ 

y     0         P   y     I  P   y  2 

dh^  +  R(Xp,yp)dA, 

where 

^  dR  ^  R(x  +  l,y)  -  R(x-l,y) 

'       dx  2 
^    ^  aR  ^  R(x,y  +  1)  -  R(x.y-l) 

"      dy  2 
After  the  computation  is  done,  the  re-sampling  techniques,  such  as  bilinear  or  bicubic 

interpolation,  will  be  employed  in  the  search  area  to  refine  the  accuracy  fi-om  whole  pixel  to 

sub-pixel  level.   As  illustrated  in  Figure  2.4,  the  bilinear  resampling  is  calculated  by 

Equation  2-8.  Then,  the  roles  of  candidate  area  and  search  area  are  switched;  namely,  the 

candidate  area  becomes  the  search  area,  and  the  search  area  becomes  the  candidate  area.  The 

whole  procedure,  including  the  least-squares  and  the  re-sampling,  is  re-employed  in  the 

reverse  sense  with  the  "new"  candidate  and  search  areas.  Some  photogrammetrists  suggest 

that  the  procedure  mentioned  above  should  be  iteratively  applied  until  the  computed 

magnitudes  do  not  have  significant  changes. 
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I(x-,y)  =  (1  -  i)[(l  -  ^:)I(x,y)  +  ^I(x,y  +  1)]  +  ,2-8^ 
Z,[(l  -  ^)I(x  +  l,y)  +  ^:i(x  +  l,y  +  1)] 

where 

I(x,y) :  representing  the  DN  of  a  image  at  position  (x,  y) 

K,  L  :  represent  the  fractional  column  and  row  locations,  respectively 

I(  x',  y' ) :  represent  resampled  intensity 

The  least-squares  method  requires  considerable  computation  compared  with  the 
cross-  correlation  and  Fourier  transform.  Some  photogrammetrists  suggest  that  the  size  of 
the  candidate  area  be  not  less  than  20  x  20  pixels  to  reduce  reliability  problems 
However,  the  larger  the  window  size  of  the  candidate  area  have,  the  more  computation  is 
needed. 

2.2-4  Miscellany  of  Other  Methods 

In  the  frequency  domain,  the  phase  contains  considerable  information  about  a 
signal'^l  Scientists  have  aheady  proven  that  it  is  possible  to  nearly  recover  an  original  image 
using  only  its  phase  information  rather  than  its  magnitude  information^''^.  Currently,  two 
methods  (window  Fourier  transform  and  Gabor  wavelet )  have  been  used  to  transform  the 
image  from  the  spatial  domain  to  the  frequency  domain.  The  phase  difference  indicates  the 
spatial  shift  or  disparity'^°^^^'^'''l.  Then,  a  constraint  in  which  the  error  fimction  is  minimum, 
is  applied  such  that  the  root  of  the  fiinction,  spatial  shift,  can  be  found  by  employing 
numerical  techniques  to  solve  the  partial  equation.  The  methods  described  above  have  their 
advantages,  the  matched  results  have  a  high  degree  of  accuracy,  and  a  parallel  algorithm  is 
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available  to  increase  processing  speed.  However,  these  methods  are  still  in  the  testing  stages, 
and  have  not  been  widely  adopted  by  photogrammetrists  and  computer  scientists. 

2.3  Feature-Based  Matching 

Scientists  suggest  that  feature-based  matching  may  be  a  better  method  to  deal  with 
the  matching  problem'^^.  Feature-based  matching  uses  the  features  which  are  extracted  from 
the  DNs  of  a  digital  image  to  locate  the  correspondences  of  a  stereo  pair.  With  recent 
research,  using  the  features  can  reach  a  high  degree  of  accuracy,  and  the  processing  speed 
of  feature-based  matching  is  faster  than  that  of  area-based  matching  because  feature-based 
matching  employs  only  the  features,  like  points  and  edges,  in  dealing  with  the  matching 
problem  rather  than  the  whole  sub-image  itself  (  candidate  or  search  area).  However,  for 
those  images  without  enough  features,  it  is  very  difficult  to  generate  a  complete  DTM  by 
using  only  limit  features. 

Scientists  have  proposed  a  theory  that  human  eyes  can  decompose  a  scene  that  both 
eyes  see  into  several  levels  such  that  the  information  contained  in  the  scene  is  broken  down 
into  different  scales.  Then,  both  eyes  start  to  process  the  information  at  the  coarsest  level, 
which  contain  less  information  than  that  of  the  others.  After  the  information  at  the  coarsest 
level  has  been  processed,  the  eyes  start  to  process  the  information  at  the  fmer  levels  until 
every  level  has  been  processed.  Then,  a  signal  is  passed  to  human  brain  to  inform  the  brain 
what  the  eyes  saw.  In  fact,  the  whole  procedure  is  done  in  a  few  micro  seconds,  or  even  less. 
Consequently,  many  corresponding  computer  algorithms  for  extracting  features  at  different 
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levels  have  been  proposedt'''i[^''if^'it^*'t'^it57]  f^gj.g^  j^j^^^jg  methods,  using  Gaussian  and 
Laplacian  filters  and  Mallat's  wavelet,  to  decompose  the  original  image  into  a  series  of 
images  with  different  scales  are  introduced.  The  difference  between  these  two  methods  is 
discussed. 

2.3-1  Laplacian  and  Gaussian  Representation 

An  edge  in  an  image  reflects  the  phenomenon  of  the  physical  changes  in  the  object 
space.  Commonly,  an  image  is  smoothed  by  a  filter,  such  as  a  Gaussian  filter,  and  then  the 
second-order  derivative  of  the  DNs,  Laplacian  filter,  of  the  filtered  unage  is  determined.  The 
second-order  derivative  of  the  filtered  image  represents  the  abrupt  changes  of  DNs  in  the 
image.  Hence,  those  positions  whose  second-order  derivative  is  equal  to  zero  are  defined  as 
the  edges  in  the  filtered  image,  and  are  called  zero-crossingt^'*it^'i.  The  Gaussian  function  is 
described  in  Equation  2-9. 


where  a  determines  the  cutoff  frequency  with  large  ct  corresponding  to  lower  cutoff 
fi-equency.  For  example,  the  3D  sketch  of  the  Gaussian  function  with  =  4  is  shown  in  the 
Figure  2.5.  The  Laplacian  operator,  the  second-order  derivative  operator,  is  presented  in 
Equation  2-10. 


+ 


(2-10) 
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The  convolution  of  an  image,  f(x,y),  and  the  Laplacian  and  Gaussian  operator  is  written  as 


where  *  indicates  the  convolution  operation  and  the  filter,  g  ( x ,  y ) ,  is  called  Laplacian  and 
Gaussian  filter,  LOG.  For  example,  the  3D  sketch  of  LOG  with  =  4  is  shown  in  the 
Figure  2.6. 

The  main  reason  for  employing  the  Gaussian  fiinction  is  that  it  is  smooth  and 
localized  in  the  spatial  and  fi-equency  domain'^ 'I  As  illustrated  in  Figure  2.7, 
photogrammetrists  have  implemented  a  series  of  images  with  different  scales  by  convolving 
the  original  image  with  different  sizes  of  the  Laplacian  and  Gaussian  filter.  This  series  of 
images  created  by  using  the  LOG  with  different  sizes  is  called  a  LOG  image  pyramid.  For 
example,  the  image  shown  in  Figure  2.9  is  created  by  convolving  the  original  image,  shown 
in  Figure  2.8,  with  a  33  x  33  LOG  filter.  Then,  the  edges  are  located  by  finding  those  places 
where  the  DN  and  their  neighbors  are  different  signs^^''^^^''^'*^'.  All  possible  conditions  for  a 
pixel  with  4  neighbors  are  illustrated  in  Figure  2.10.  Consequently,  the  edges  are  matched 
at  the  coarsest  level  by  comparing  contrast  sign  and  orientation  between  the  filtered  left  and 
filtered  right  image  t'^P-'lPS]  xhese  matched  positions  at  the  coarsest  level  are  provided  as 
the  approximate  ones  for  the  next  finer  level  such  that  the  searching  range  at  each  level  can 
be  reduced.  Then,  the  matching  procedure  is  applied  in  the  limited  searching  range.  The 


^(fCx.y)  *  g(x,y))  =  f(x,y)  *  V2g(x,y) 


(2-11) 


V^g(x.y)  =  (x^  +  y'  -  271^0)- 


(2-12) 
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whole  procedure  mentioned  above  will  be  re-employed  until  the  matching  job  at  the  finest 
level  is  complete.  Even  though  the  zero-crossing  has  been  successfully  employed  to  locate 
the  stereo  correspondences  in  a  stereo  pair,  there  are  some  difficulties  remaining.  For 
example,  by  applying  the  LOG  filter  of  a  large  size,  the  edges  detected  by  applying  zero- 
crossing  from  the  filtered  image  have  some  phenomena,  like  edge  dislocations,  missing 
edges,  and  false  or  spurious  edges  [^^I^^IP^^.  The  reasons  for  these  phenomena  are  still  unclear. 
2.3-2  Wavelet  Representation 

The  Laplacian  and  Gaussian  filter  is  not  the  only  way  to  decompose  an  image  into 
a  series  of  images  with  different  scales.  Recently,  mathematicians  have  found  that  there  exist 
some  fimctions,      which  satisfy  following  properties: 


(2-13) 


and 


A  <  J2  Ilf(2  <  B 


(2-14) 


-I 


where 


C,,,  is  a  constant 


t}r is  the  Fourier  Transform  of  i|;  and  f°°}^(x)dx  =  0 


A  and  B,  with  0  <  A  ^  B  <  °°,aie  constants  independent  of  (ji^^^K 
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These  functions,  i|; ,  are  called  basic  wavelets,  and  are  smooth  and  localized  in  the 
spatial  and  frequency  domains.  Especially  for  high  frequency  information,  the  wavelet 
transform  can  describe  edges  better  than  the  other  methods  There  is  some  biological 
evidence,  human  eyes  decompose  a  scene  into  several  spatially-oriented  frequency  channels 
^^''1.  It  seems  that  the  multiresolution  representation  with  wavelet  transform  is  very  similar 
to  the  way  that  human  visual  system  analyzes  a  scene  ''''i. 

The  wavelet  transform  of  a  fimction,  f  ( x ) ,  with  respect  to  the  wavelet  i|;  ( x )  is  equal 

to  the  wavelet  transform  of  its  pth  derivative.  For  example,  Mallat  introduced  a  function, 

a ( X ) ,  to  create  a  wavelet,  i|/ ( x ) ,  such  that  i|f  ( x )  =  <^^M  jhe^,  the  wavelet  transform 

dx 

of  f(x)  at  the  scale  s  and  position  x,  computed  with  respect  to  i|f(x),  is  defined  by 

W  f(x)  =  f  *  U;  (x)  =  s— (f  *  a)(x)t^*l.    For  a  2-D  image,  f(x,y),  its  wavelet 
'  '  dx 

fransform,  W^f(x),  is  : 

W,f(x)  =  (f  *  il;^)(x)  (2-15) 

Euqation  2-15  is  presented  in  Equations  2-16  and  2-17  for  x  and  y  directions,  respectively: 

w;f(x,y)  =  (f  *  il;')(x,y)  (2-16) 

:  W>(x,y)  =  (f  *  l|f^)(x,y)  (2-17) 

Then,  its  magnitude  and  angle  is  given  in  Equation  2-18  and  2-19,  respectively. 

M,f(x,y)  =^|w;f(x,y)|^  +  |w/f(x,y)p  (2-18) 
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f(x,y)  =tan-'   (2-19) 

IW.'fCx.y)! 

Basically  the  features  of  an  image,  like  edges,  are  extracted  by  detecting  the  local 
maxima  of  M  ^  f  ( x ,  y ) .  In  one  dimension,  the  inflection  point  of  a  function  can  be  detected 
by  employing  the  first-order  derivative  of  the  function.  Its  value  at  this  point  can  be  either 
the  local  maximum  or  minimum  value  of  the  function  at  a  particular  interval.  From  the 
definition  of  an  edge,  the  local  maximum  value  of  the  function,  M^f(x,y),  is  the 
information  that  we  are  looking  for.  Traditionally  zero-crossing  has  been  wddely  adopted  to 
locate  those  points  whose  DNs  change  abruptly  in  an  image.  However,  it  is  difficult  to 
distinguish  small  fluctuations  occurring  in  the  high  frequency  part  by  employing  the  zero- 
crossing  approach  because  the  Laplacian  and  Gaussian  filter  used  in  zero-crossing  approach 
is  a  low-pass  filter:  so,  all  the  information  at  high  frequencies  will  be  eliminated'^^l  The 
wavelet  transform  provides  a  tool  to  analyzes  those  small  fluctuations  occurring  at  a  high 
frequency  because  it  divides  the  data  into  several  frequency  components,  and  then  examines 
each  component  corresponding  to  the  appropriate  resolution''"'.  In  fact,  scientists  have 
proven  that  when  the  scales  decrease,  the  wavelet  transform  of  a  fimction  at  the  scale  is 
sensitive  to  those  small  fluctuations  occurring  at  high  frequency'^^l  As  for  the  stereo 
correspondences,  it  is  completed  by  comparing  the  magnitude  and  the  orientation  of  a  point 
in  the  left  transformed  image  and  its  corresponding  point  in  the  right  image'^"'. 

Figure  2.11  represents  the  magnitude  information  of  the  transformed  image  by 
applying  Mallat's  wavelet  on  the  original  image,  shown  on  the  Figure  2.8,  at  the  scale  2  ". 
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Figure  2.12  shows  the  angle  information  at  the  scale  2"^.   The  magnitude  and  angle 
information  illustrated  here  has  been  linearly  stretched  such  that  the  intensity  values  shown 
in  the  image  are  distributed  between  0  and  255.  Figure  2.13  represents  the  local  maxima 
extracted  from  the  Figure  2.11. 
2.3-3  Difference  between  Different  Image  Pyramids 

In  this  research,  two  kinds  of  image  pyramids  were  used  to  represent  a  series  of 
images  at  different  scales.  At  different  scales,  the  images  contain  the  original  information 
from  the  original  images  with  different  details.  Usually  in  stereo  matching,  a  preliminary 
procedure  to  create  a  digital  terrain  model  is  to  use  the  different  details  of  the  original 
information  to  increase  the  processing  speed  and  improve  the  accuracy  of  the  matched 
results.  The  advantage  is  to  reduce  those  mis-matchings  occurring  at  each  scale.  The 
difference  between  LOG  and  Mallat's  wavelet  image  pyramids  are  discussed  as  follows: 

In  the  human  visual  system,  an  image  is  treated  on  several  levels  of  resolution  at  the 
same  time.  The  relevant  details  of  the  image  exist  over  a  restricted  range  of  scale  because 
the  distance  between  an  object  shown  in  the  scene  and  the  optical  center  of  the  camera 
changes.  The  best  way  to  analyze  a  image  is  to  decompose  it  into  a  series  of  images  with 
different  scales'^^"^''i'^'"''*"'*^.  In  general,  there  are  two  basic  methods  to  this  kind  of 
multiresolution  presentation;  one  is  to  use  Laplacian  and  Gaussian  filter  with  different  scales, 
and  another  one  is  to  use  the  wavelet  approach.  Burt'*'  has  implemented  the  approach  to 
extract  the  details  at  each  resolution  2^  and  is  introduced  as  follows. 

Let  the  discrete  approximation  of  a  function,  f(x),  at  the  resolution  2^  yield  A^jf . 
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In  Burt's  image  pyramid,  the  details  of  f(x)  which  appear  in  A^j^if  but  not  inA^jf  are 
extracted.  The  discrete  approximation,  A^j^i  f ,  lias  twice  pixel  numbers  as  many  as  A^jf,  so  A^jf 
needs  to  be  expanded  by  a  factor  of  two;  namely,  a  zero  is  put  between  each  sample  of  A^jf . 
Then,  a  low  -pass  filter  is  applied  on  the  resulting  signal,  and  the  filtered  result  is  defined  as 
Ajf.  The  details,  D^jf,  are  the  difference  between  A^j^if  and  Ajf.  The  procedure  is 
illustrated  in  Figure  2.14  where  Fq  presents  the  low-pass  filter. 

As  for  the  wavelet  approach,  the  details,  D^jf ,  are  calculated  by  convolving  a  high 
-pass  filter  with  Ajj.if  instead  of  calculating  the  difference  between  Ajj.if  andAjf  so  that 
the  processing  speed  can  be  increased.  The  procedure  is  illustrated  in  the  Figure  2.15.  From 
Mallat'^^i,  Burt's  image  pyramid  has  a  property  that  the  data  are  correlated  with  each  other's 
at  different  levels  such  that  it  is  very  difficult  to  fmd  out  whether  a  similarity  between  image 
details  at  different  resolutions  comes  from  a  property  of  the  image  itself  or  is  a  kind  of 
redundant  representation  by  employing  Burt's  approach.  Furthermore,  the  A^jf  is  created 
by  convolving  a  low-pass  filter  with  A^^i  f .  It  is  very  similar  to  the  way  that  A2jf  is  created 
in  Burt's  image  pyramid. 

In  addition  to  having  faster  processing  speed  compared  with  Burt's  approach,  the 
image  pyramid  created  by  employing  the  wavelet  approach  provides  information  about 
spatial  orientations.  The  multiresolution  based  on  the  Laplacian  and  Gaussian  approach  does 
not  provide  any  information  related  to  spatial  orientations  during  decomposing  the  original 
images.  On  the  contrary,  when  wavelet  image  is  employed,  the  processed  image  provides 
several  results  which  contain  the  spatial  information,  like  vertical  and  horizontal  frequency 
information.  Usually,  the  information  related  to  spatial  orientations  can  be  easily  used  to 
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discriminate  texture  for  pattern  recognition  problems'^''l 

In  addition  to  the  structures  of  LOG  pyramid  and  wavelet  pyramid,  the  way  to  extract 
the  features  from  image  pyramids  is  different,  too.  Fundamentally,  the  features  extracted 
from  LOG  image  pyramid  are  those  place  where  zero-crossing  occurs.  The  zero-crossing 
point  usually  occurs  at  the  inflection  point  of  a  function  and  can  be  located  by  taking  the 
derivative  of  the  fiinction.  The  inflection  points  in  a  function  can  be  used  to  indicate  the 
position  of  either  the  local  maximum  or  minimum  value  of  the  fimction  in  a  specified  range. 
If  the  local  maximum  of  the  function  needs  to  be  located,  the  derivative  is  taken  one  more 
time  on  the  derivative  of  the  fimction.  Then,  a  local  maximum  is  located  when  the  values 
of  the  derivative  at  the  neighbors  of  the  inflection  point  are  less  than  that  at  the  inflection 
point.  Sometimes,  this  is  difficult  to  implement  and  takes  much  more  time  to  process  the 
data.  In  contrast,  the  local  maximum  of  the  results  by  employing  wavelet  analysis  can  be 
easily  located  by  taking  the  first  derivative  of  the  function.  However,  zero-crossings  only 
provide  the  information  of  the  point  position  but  do  not  distinguish  small  fluctuations  from 
important  discontinuities  while  the  local  maxima  from  wavelet  analysis  have  this  function 
to  describe  those  small  fluctuations. 

2.4  Photo grammetric  Techniques 

Photogrammetry  is  the  study  of  the  relationship  of  the  physical  objects  and  the 
environment  through  processes  of  recording,  measuring,  and  interpreting  photos  and  pattems 
of  recorded  radiant  electromagnetic  energy  and  other  phenomenat^'l  The  basic  geometric 
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relationship  between  the  object  space  and  photo  coordinate  systems  is  illustrated  in  Figure 
2.16.  The  background  needed  for  this  research  consists  of  the  2D  affine  transformation  and 
bundle  adjustment.  As  for  the  affme  transformation,  its  main  purpose  is  to  provide  a  way 
to  exchange  the  positional  data  between  different  domains.  In  this  research,  a  two- 
dimensional  affme  transformation  is  used  to  transform  from  the  raster  into  the  photo  domain. 
The  bimdle  adjustment  puts  all  of  the  information  together,  including  orientations  of  the 
photos  and  photo-control  points,  to  form  a  set  of  simultaneous  least-squares  equations.  The 
least-squares  constraint  enforces  the  condition  that  the  sum  of  the  squares  of  residuals  is 
minimized.  This  solution  is  the  most  probable  solution,  assuming  normally  distributed 
errors.  These  procedures  are  unportant  to  photogrammetric  data  capture,  and  are  introduced 
in  the  following.  For  convenience,  the  right-hand  coordinate  system  is  used  through  all 
discussions.  '  .  .  • 

2.4-1  Affme  Transformation  ,     .       .  •. 

■  The  affme  transformation  is  a  linear  operator  for  transforming  data  from  one  domain 
to  another  by  assuming  that  a  linear  relationship  exists  between  the  two  domains.  In  general, 
some  control  points  are  selected  from  these  two  domains,  and  these  conmion  control  points 
are  used  to  establish  the  linear  relationship.  From  Figure  2.17,  the  relationships  between  x, 
y  coordinate  system  and  x^,  yA  coordinate  system  are  written  in  Equation  2-20. 


=  (T^  +  S^x) 
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The  relationships  between  x^,  Ya  and  x',  y'  coordinate  systems  is  established  by  Equation 
2-21. 


x"  =   X  cos((j)  -  e)  +  y  sin(^  -  e) 

J,  X  (2-21) 

y  =  -x^sm(q)  -  e)  +  y^cos(q)  -  e) 


If  the  Equation  2-20  is  substituted  into  Equation  2-21,  the  relationship  between  x,  y  and  x', 
y'  can  be  written  as: 


X-  =  (T^  +  S^x)cos((l)  -  e)  +  (Ty  +  Syy)sin(4)  -  e) 
y  =  -(T^  +  S^x)sin((|)  -  e)  +  (T^  +  Syy)cos((j)  -  e) 


For  the  simplicity,  the  Equation  2-22  is  re-written  as  in  linear  form  as  shown  in  Equation  2- 
23. 


X-  =  +  a,x  +  a^y 
y  =  \  +  bjx  +  b^y 


where 

a^  =  T^cos((})  -  e)  +  TySin((j)  -  e) 

a,  =  S^cos((j)  -  e) 
&2  =  SySin((j)  -  e) 

\  =  -T^sin((j)  -  e)  +  TyCos((|)  -  e) 

b,  =  -S^sin((j)  -  8) 
bj  =  SyCos((l)  -  e) 


(2-23) 
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Between  two  different  systems,  the  and  5"^  are  the  scale  factors  in  x  and  y  directions, 
respectively.  The  (j)  is  the  rotation  angle  between  these  two  coordinate  systems,  and  the  s 
is  a  non-orthogonality  factor  between  different  coordinate  systems.  In  the  typical  case,  the 
E  is  small. 

2.4-2  Bundle  Adjustment 

The  bundle  adjustment  combines  all  observations  (  such  as  photo-point  coordinate 
observations,  camera-parameter  observations  and  ground-point  observations  ),  together  in 
a  simultaneous  least-squares  solution'^'.  The  mathematic  models  for  each  of  these  three 
kinds  of  observations  are  discussed  separately.  Then,  a  combined  mathematical  model  is 
presented. 

The  angular  attitude  of  a  photo  can  be  quantified  by  three  rotation  angles  ( co,  (|),  and 
k,  about  X,  y,  and  z  axes,  respectively)  as  illustrated  in  Figure  2.18,  when  an  aerial  photo  was 
taken  such  that  the  photo  coordinate  system  is  not  parallel  to  the  object  space  system. 
Namely,  the  geometric  relationship  illustrated  in  Figure  2.16  cannot  be  applied  unless  some 
modifications  are  made.  Therefore,  it  is  necessary  to  rotate  the  photo  coordinate  to  a 
coordinate  system  which  is  parallel  to  the  object  space  coordinate  system  is  necessary.  The 
relationship  is  shown  in  the  Equation  2-24. 
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(2-24) 


where 


™11  = 

cos^JsinK 

m,2  = 

sin  (0  sin  (j)  cos  K  + 

coswsinK 

m,3  = 

-cos(i)sin({)cosK 

+  sin  0)  sin  K 

m^j  = 

-cos(|)sinK 

m^j  = 

cos  0)  sin  (|)  sin  K  + 

sinwcosK 

-sin  0)  sin  (j)  sin  K 

^  COSWCOSK 

i 

™31  = 

sin(j) 

-sina)cos(j) 

™33  = 

coswsincj) 

The  geometric  relationship  shown  in  Figure  2.16  demonstrates  that  the  exposure  station, 
ground  object  point,  and  its  corresponding  photo  point  all  lie  along  a  straight  line.  This 
condition  is  called  coUinearity,  and  its  mathematic  representation  is  shown  as  folio ws'^*': 

m3,(Xj  -  X,'^)  +  m3,(Yj  -  Y^"^)  +  m33(Z.  -  Z,"") 


40 


— 2 


^m„(X.  -  X ^  m„(Y.  -  Y ^  111,3(2.  -  Z,) 
m3,(X.  -  Xi"^)  +  m3,(Y.  -  Y.'^)  +  m^^iZ.  -  Z^) 


(2-26) 


where: 


x.j  ,  :  the  photo  coordinate  of  the  j-th  ground  point  on  the  i-th  photo. 
Xp  ,  Yp  :  the  photo  coordinate  of  the  principal  point  of  a  photo. 

X..  =  x  .  -  X 
y       y  p 


r..  =  (x..    +  v..  ) 


k,  ,      ,  kj  :  the  correction  coefficients  for  Gaussian  symmetric  radial  distortion. 

p,  ,  P2   :  the  correction  coefficients  for  decentering  distortion. 

c        c  c 

Xj   ,  Yj   ,  Zj   :  the  object  space  coordinates  of  the  i-th  exposure  station. 

X.  ,  Yj  ,  Zj  :  the  object  space  coordinate  of  the  j-th  ground  point. 

f :  the  focal  length  ( principal  distance )  of  the  camera. 

m's  :  rotation  matrix  elements. 
The  collinearity  equations,  Equations  2-25  and  2-26,  are  linearized  by  applying  Taylor's 
series  expansion  about  the  measured  photo  coordinate  observations  x^^  ,  y.. ,  and  initial 
approximations  for  the  other  unknown  parameters.  The  linearized  forms  are  written  with 
respect  to  the  x  and  y  direction,  respectively,  as  follows: 
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0  =  (F)„  +  (— )„Ax..  +  (  — )„Ax    +  (-^)  Ay..  +  (  — )„Ay  + 
(^),Ak,  !  (^)„Ak,  I  (^)„Ak3  I  (iL)„Ap,  1  (^)„Ap, 


ak, 
ap 


ak. 


ak 


ap 


ap. 


ap 


ap 


ap 


(— )„Af  +  (— )„Aa)  +  (— )„A(J)  +  (— )„Ak  + 


df 


ao) 


dK 


(2-27) 


ax. 


av, 


az. 


.  ap .  A  ,r    /  ap .  . .  ap .  .  ^ 

(  )nAX.    +   (  )nAY.    +   (  )nAZ. 


av. 


az. 


0  =  (G).  +  (  )nAx..  +  (  ).Ax    +  (  )nAy..  +  (  )„Ay  + 

"     ax.  "  'J     ax  "   "     ay  .  °   'J     ay  °  p 

IJ  p  •'ij  ■'p 

( AX.  t  (-^)„AY.  *  (■^j.AZ, 

ax,  °   '    dY,  °   •    az,  "  ' 


where: 


P  =  X 


\  +  ^Cl^.^ij  +  P.C^ij  +2Xy  )  +  2p,x..y.,  + 

^m„(X.-X,")  ^  m„(Y.-Y,")  +  m,3(Z.-Z,") 
m3,(X,-X.'^)  +  m3,(Y,-Y,")  +  m33(Z.-Z,'=) 
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G  =  v..  -  y    +  y..(k  r,^ +k  r.^ +k  r.*)  +  2p  x.y..  +  pAt^ +2y^)  + 


^m^,(X.-X-)  .m^^(Y.-Y-)  ^m„(Z.-Z^-) 


For  convenience,  the  Equations  2-27  and  2-28  is  re-written  in  symbolic  form  as  follows. 

V.  +  bA  +  B  A.  +  B  A.  =  e..  (2-29) 

As  for  each  derived  component,  its  detail  can  be  found  at  Appendix. 

For  ground  coordinate  observations,  the  observation  equations  for  the  j-th  ground 
point  are: 


Ax 

J 

\ 

AY 

J 

_  Y  m 

j 

Az 

J  . 

-zr  _ 

(2-30) 


where  a  and  m  represent  the  initial  approximations  and  measured  values,  respectively.  For 
simplicity,  the  Equation.  2-29  is  re-written  as 


V   -  A   =  C 
J       J  J 


(2-31) 


For  the  camera  parameter  observations  of  the  i-th  photo,  it  is  represented  as  follows: 
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1 

-  (w.r 

A(j). 

-  i^X 

Ak 

1 

-  (K^r 

V  c 

AXj'' 

-  (Xi'^r 

AY."" 

-  (Yj'^r 

(2-32) 


or  it  can  represented  in  symbolic  form: 


(2-33) 


After  combining  Equations  2-29, 2-31,  and  2-33,  all  observation  equations  are  put  together 
to  form 
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(2-34) 


The  stereomodel  parameters  are  established  by  employing  the  iterative  least-squares  method 
until  the  increment  is  insignificant.  As  for  an  efficient  data  storage  scheme,  the  Manual  of 
Photogrammtery  provides  a  more  detailed  discussion^^l 


2.4-3  Data  Interpretation 

To  accurately  represent  a  surface  using  limited  discrete  values  is  difficult.  In  general, 
two  kinds  of  existing  approaches  have  been  widely  adapted  to  interpreted  data;  one  is  grid- 
based  and  the  other  is  triangulation-based.  Triangulation-based  terrain  modeling  has  been 
adapted  and  applied  to  represent  the  physical  surface  of  earth,  and  has  been  implemented  in 
recently  developed  software,  like  GEOPAK.  The  basic  idea  of  triangulation-based  terrain 
modeling  is  to  partition  the  space  into  a  set  of  triangles  by  associating  all  locations  in  that 
space  with  a  finite  set  of  distinct,  isolated  points  in  a  continuous  space The  reasons  to 
use  this  approach  are:  first,  that  every  measured  data  point  is  used,  and  data  density  can  be 
adjusted  according  to  topographic  variation.  For  those  additional  points  whose  heights  are 
needed  to  be  determined,  the  heights  are  interpolated  from  the  vertices  of  the  triangles  in 
which  the  additional  points  lie.  The  second  reason  is  that  for  break-lines,  fault-lines,  and  so 
on,  the  triangulation-based  method  can  accurately  and  efficiently  describe  these  shapes  '"^l. 

Grid-based  terrain  modeling  seems  to  be  the  most  popular  approach  to  represent  the 
physical  surface  of  the  earth.  The  basic  idea  of  the  grid-based  approach  is  to  create  a 
rectangular  pattern  of  grid  cells  from  the  measured  data,  and  then  these  grids  are  used  to 
represent  the  surface  of  an  area.  Figure  2.19  shows  two  kinds  of  grid  patterns;  one  is  a 
square  pattern,  and  the  other  is  a  rectangular  pattern.  As  for  the  interpolation  method,  in 
general,  the  number  of  the  nearest  neighbors  is  pre-selected  in  order  to  use  these  nearest 
neighbors  to  interpolate  a  particular  point  determined  by  grid-based  terrain  modeling. 
Sometimes  the  number  of  nearest  neighbors  is  not  enough  to  give  an  accurate  interpolation. 
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The  auxiliary  criterion  is  established  by  specifying  the  search  range  with  the  interpolated 
point  as  the  center.  Both  criteria  for  interpolating  are  presented  in  Figure  2.20.  After  the 
grid  data  points  have  been  interpolated  from  those  measured  points,  a  polynomial  or  spline 
fiinction  is  built  by  applying  the  selected  data  points.  From  previous  research  f^^',  the 
interpolated  results  from  polynomial  and  spline  functions  are  not  noticeably  different,  but 
if  the  order  of  the  selected  polynomial  is  too  large,  the  difference  between  polynomial  and 
spline  fiinction  can  be  increased.  Other  methods,  like  minimum  curvature  '"^ have  been 
widely  used  in  interpolating  the  datum  transform. 
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CHAPTER  3 

EXPERIMENTAL  MATERIALS  AND  EQUIPMENT 
3.1  Introduction 

The  stereopair  photographs  used  in  this  research  cover  the  main  campus  of  University 
of  Florida,  in  Gainesville,  and  the  size  of  each  photograph  is  9  by  9  in.  The  main  computer 
equipment  used  in  this  research  is  a  Hewlett  Packard  workstation  700  series  with  64  MG 
RAM. 

In  this  chapter,  Section  3.2  introduces  the  details  of  photographs  used  in  this  research; 
Section  3.3  presents  the  basic  information  about  the  Hewlett  Packard  workstation;  Section 
3.4  shows  the  software  applied  in  this  research. 

3.2  Imagery 

Imagery  used  in  this  research  includes  a  stereo  pair,  which  was  taken  by  Aerial 
Cartographies  of  America,  Inc.,  Orlando,  Florida,  on  the  17th,  December  1993.  The  flying 
height  is  about  480  meters  above  mean  terrain  and  the  scale  of  the  photo  is  1 :3000. 

Detailed  information  for  the  aerial  camera  used  to  take  the  photographs  is  given  in 
Table  3-1  t'l. 
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Table  3-1  Aerial  Camera  Information 


Items 

Content 

Camera  Type 

Wild  RC8/5 

Camera  Serial  No. 

302 

Lens  Type 

Wild  Universal  Aviogon 

Lens  Serial  No. 

UAG  389 

Focal  Length 

153  mm 

Traditionally,  an  aerial  photograph  has  four  fiducial  marks  located  at  the  comers  and 
four  fiducial  marks  located  at  the  middle  position  of  the  sides.  Figure  3.1  shows  their 
relative  positions  in  a  photograph.  Their  calibrated  coordinates  are  shown  in  the  Table  3-2^''. 


Table  3-2  Calibrated  Coordinates  of  Fiducial  Marks. 


Points 

X  coordinate  (mm) 

y  coordinate  (mm) 

1 

-105.992 

-105.983 

2 

106.006 

105.984 

3 

-105.992 

106.005 

4 

106.009 

-106.000 

5 

-109.994 

0.017 

6 

110.006 

0.002 

7 

0.012 

110.000 

8 

0.015 

-109.997 
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Figures  3.2  and  3.3  show  the  digital  images  obtained  by  scanning  the  aerial 
photographs  with  40  nm  per  pixel.  The  number  of  rows  and  columns  of  the  digitized  image 
are  the  same:  5756.  The  disk  space  required  to  store  the  digitized  photographs  is  about  30 
megabytes  each.  The  basic  information  for  the  scanner  used  to  digitize  the  aerial  photograph 
is  given  in  Table  3-3. 


Table  3-3  Scarmer  Information 


Item 

Content 

Manufacturer 

Vexcel 

Model  No. 

3000 

Type 

Flatbed  scanner 

Geometric 

±  0.002  mm 

Accuracy 

3.3  Computer  Hardware 

The  computer  hardware  used  in  this  research  is  a  Hewlett  Packard  workstation.  The 
model  type  is  Hewlett  Packard  715-50.  The  workstation  is  connected  to  the  network  of  the 
Surveying  and  Mapping  Program,  Department  of  Civil  Engineering.  Another  instrument 
used  in  this  research  is  the  analytical  stereo  plotter,  Kern  DSR-14,  shown  in  the  Figure  3.4. 
The  stereo  plotter  was  used  to  collect  the  terrain  elevation  data  from  the  stereopair  in  order 
to  compare  the  collected  data  from  developed  system.  The  stereo-plotter  has  been  upgraded 
from  DSR-1  to  DSR-14  and  it  has  a  ±  0.001  mm  precision. 
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3.4  Software 
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Fundamentally,  the  soflAvare  used  in  this  research  is  divided  into  two  kinds;  the  first 
kind  of  software  was  developed  by  the  author,  the  second  kind  of  software  was  provided  by 
Dr.  Bon  Dewitt  and  commercial  software.  The  software  developed  by  author  is  introduced 
in  the  next  chapter.  The  software,  SCBUN  (  Self  Calibration  of  Bundle  Adjustment ),  was 
developed  by  Dr.  Bon  Dewitt  to  calculate  the  camera  parameters  of  photos,  and  its 
computational  algorithm  has  been  introduced  in  the  previous  chapter.  The  commercial 
software,  SURFER,  developed  by  Golden  Software  Inc.,  is  used  to  create  the  grids  fi'om  the 
collected  data.  .-     *  *  ,    ■        ■  ' 


CHAPTER  4 
METHODOLOGY 


4.1  Introduction 

This  chapter  presents  a  schematic  system  which  can  reduce  human  operation  on 
photogrammetric  data  capture  with  assistance  from  a  computer.  As  illustrated  in  Figure  4. 1 
this  system  includes  different  operations,  like  interior  orientation,  relative  orientation, 
absolute  orientation,  stereo  matching,  and  digital  terrain  generation. 

Traditionally,  photogrammetric  data  capture  has  been  a  labor  intensive  job  and 
heavily  depends  upon  operator's  experience  and  judgement.  The  developed  system  can 
reduce  the  operator's  working  burden  for  DTM  production,  and  only  leaves  the  operator  to 
decide  what  kind  of  results  can  be  accepted.  A  pattern-recognition  technique  with  Fourier 
transform  and  new  strategy  to  locate  the  stereo  correspondences  of  the  given  stereo  pair  are 
applied  in  the  developed  system.  »  . 

In  this  chapter.  Section  4.2  shows  a  pattern-recognition  technique  with  Fourier 
transform  used  to  accurately  locate  the  fiducial  marks  from  the  digitized  photos;  Section  4.3 
demonstrates  how  to  calculate  the  camera  parameters  by  employing  the  bundle  adjustment; 
Section  4.4  presents  a  new  strategy  to  locate  the  stereo  correspondences  from  the  given 
stereo  pair;  finally,  Section  4.5  shows  how  to  generate  the  digital  terrain  model. 
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Interoir  Orientation 
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Bundle  Adjustment 


Camera  Parameters 


Camera  Parameters 


Stereo  Matching 


Digital  Terrain  Model 


Figure  4.1  Process  flow  in  the  basic  modules  of  the  developed  DTM  generation 
system. 


4.2  Interior  Orientation 
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Interior  orientation  is  an  operation  to  make  sure  that  the  center  of  a  photograph  can 
be  used  as  an  origin  point  such  that  any  photo  measurement  is  referenced  to  the  principal 
point.  Fundamentally,  the  first  step  in  interior  orientation  of  a  digitized  photograph  is  to 
locate  the  fiducial  marks'  positions  in  the  given  photographs.  There  are  two  approaches 
available;  one  is  to  apply  a  manual  approach,  and  another  one  is  to  use  a  computer  with 
pattern  recognition  techniques.  In  fact,  the  manual  approach  is  traditional ,  and  has  been 
used  in  the  past.  The  manual  approach  needs  a  human  operator  to  watch  the  digitized  photos 
showing  on  a  display  device  like  a  monitor,  simultaneously  search  for  the  fiducial  marks  on 
the  display  device,  and  then  push  a  point  device  to  record  the  measured  results.  Often,  this 
operation  needs  to  be  repeated  several  times  till  the  standard  deviation  of  the  measured 
results  is  less  than  a  pre-established  threshold.  It  is  a  tedious  and  time-consuming  job.  \ 

Recently,  with  the  advance  of  computer  technologies,  to  automatically  locate  the 
position  of  an  object  in  a  digitized  photograph  has  been  proposed  and  applied  to  find  the 
positions  of  fiducial  marks  of  a  digitized  photograph.  For  example,  one  approach,  using  the 
Hough  transform  to  transform  the  image  coordinates  fi-om  the  Cartesian  coordinate  system 
to  the  polar  coordinate  system,  picking  the  magnitudes  and  angles  fi-om  the  polar  coordinate 
system  which  are  similar  to  the  magnitudes  and  angles  of  the  fiducial  marks  from  a  Hough 
array,  and  then  applying  image  correlation  to  find  out  where  the  correlation  coefficient  is 
larger  than  the  others,  has  been  proposed'^^l  Of  course,  the  sizes  and  shapes  of  the  fiducial 
marks  must  be  known  at  first.  This  approach  can  reach  a  high  degree  of  accuracy  but  this 
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approach  can  only  be  applied  for  those  fiducial  marks  with  a  particular  shape,  like  a  cross. 
Therefore,  it  is  necessary  to  build  a  more  robust  approach  to  locate  the  positions  of  the 
fiducial  marks. 

Dr.  Sharon  Wang  has  proposed  a  new  approach,  developing  a  Four-Dimensional 
Hyper-Image  Space  to  locate  the  position  of  an  object'''*'.  The  Four-Dimensional  Hyper- 
Image  Space  contains  the  shifts  of  different  coordinate  systems  in  the  x  and  y  direction 
respectively,  the  rotation  angle  between  different  coordinate  systems,  and  the  scale  factor 
between  coordinate  systems.  The  basic  assumption  made  is  that  for  a  pre-built  template,  if 
its  location  is  to  be  located  in  a  scene  containing  an  image  resembling  the  template,  then 
there  exists  a  conformal  transform  such  that  the  pre-built  template  showing  in  the  scene  is 
under  the  effects  of  conformal  mapping.  The  first  procedure  from  her  theory  is  to  transform 
the  pre-built  template  and  the  scene  containing  the  deformed  template  into  the  proposed  4D 
Hyper-Image  Space.  Then,  a  phase  filter  is  created  from  the  pre-built  template  image,  and 
the  filter  is  convolved  with  the  scene  containing  the  deformed  template  in  the  proposed 
space.  Finally,  the  convolved  results  are  transformed  back  to  a  coordinate  system.  In  that 
coordinate  system,  the  position  of  the  pre-built  template  is  located  at  the  place  with  the 
highest  gray-value.  As  for  the  rotation  angle  between  different  coordinate  systems  and  the 
scale  factor,  their  values  are  located  at  corresponding  positions  on  the  x  and  y  axes  in  the 
coordinate  system. 

After  I  discussed  with  Dr.  Wang,  a  simplified  approach  of  her  Four-Dimensional 
Hyper-Image  Space  which  can  be  used  to  locate  the  fiducial  marks  form  a  digitized  photo 
was  devised  and  is  described  as  follows:  Let  the  image  fiinction,  f  ( x ,  y ) ,  represent  the  scene 
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which  contains  a  template  function,  t(x,y),  and  the  background  effect,  b(x,y).  The 
mathematical  representation  is  written  as  Equation  4-1: 

f(x,y)  =  t(x,y)  +  b(x,y)  (4-1) 

Then,  taking  Fourier  transform  on  the  template  yields 

T(u,v)  =  IJ  t(x,y)  e-j^'^^""  ^^Mxdy  (4-2) 

Therefore,  the  phase  filter  is  created  as  follows: 

G*(».v)  =  (4-3) 
|T(u,v)| 

where  *  indicates  the  complex  conjugate.  The  Fourier  transform  of  the  image  function, 
f(x,y),  is  represented  as  follows: 

F(u,v)  =  Jf  f(x,y)e-j^"('"' ^^Mxdy  (4-4) 

Then,  the  correlation  of  F(u,v)  andG,jj(u,v)  is: 

C(u,v)  =  F(u,v)  G^(u,v)  (4-5) 

Finally,  taking  inverse  Fourier  transform  on  C  ( u ,  v )  yields 

c(x,y)  =  |||  C(u,v)eJ^"('"' ^^Mxdyl         ^^      .  (4-6) 

The  c(x,y)  represents  the  magnitude  of  the  inverse  Fourier  Transform  of  C(u,v).  The  location 
of  the  template  shown  in  the  image,  f(x,y),  is  located  at  the  maximum  value  of  c(x,y)  if  the 
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template  exists  in  the  image.  This  approach  acts  as  other  high  discriminating  filters,  such  as 
an  inverse  filter,  but  it  is  more  stable'^'*'.  After  the  whole  procedure  has  been  done,  the 
iterative  least-squares  method  is  applied  to  refine  the  position  to  the  sub-pixel  level.  The 
whole  system  is  illustrated  in  Figure  4.2.  Appendix  B  explains  the  reason  why  the  Fourier 
transform  works  well  to  locate  a  template's  position  from  a  scene  containing  the  template, 
and  shows  examples  of  using  the  Fourier  transform  to  locate  a  template's  position  from  a 
scene  containing  the  template. 

After  the  fiducial  marks  have  been  located,  the  image  coordinates  of  the  fiducial 
marks  and  their  corresponding  coordinates  shown  in  the  Table  3-2  are  used  to  establish  a 
linear  relationship  with  2-D  affine  transform  such  that  the  image  coordinates  can  be 
transformed  to  the  photo  coordinates  and,  therefore,  transformed  coordinates  can  apply  the 
developed  geometric  relationships  from  photos.  The  2-D  affine  transform  is  shown  as 
follows: 

=  a.  +  a,  X  +  a-v 

photo        0         1  2^  (4_7^ 

yphoto  =  \  ^^^^ 


where  the  x  and  y  represent  the  components  of  the  image  coordinate  in  x  and  y  directions. 
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Figure  4.2.  Process  flow  for  locating  the  positions  of  fiducial  marks  from  a  digitized 
photograph  in  the  developed  system. 


4.3  Bundle  Adjustment 
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Bundle  adjustment  provides  a  way  to  precisely  determine  the  orientation  of  photos 
when  the  photos  were  taken.  Before  the  bundle  adjustment  can  be  executed,  some  control 
points  imaged  in  the  stereo  pair  must  be  located.  The  object-space  coordinates  of  these 
control  points  are  given,  or  determined  by  the  other  surveying  techniques,  such  traditional 
ground  survey  or  the  Global  Positioning  System  ( GPS  ).  The  reason  to  employ  the  control 
points  with  known  object-space  coordinates  is  to  establish  a  linear  relationship  between  the 
photo  and  a  real  world  coordinate  system  such  that  the  3-D  information  can  be  obtained  from 
2-D  images  with  additional  conditions. 

4.3-1  Finding  Control  Points  in  a  Stereo  Pair 

li  ■■'  .  - 

"When  a  stereo  pair  is  given,  establishing  a  relationship  between  measured  photo 
coordinates  and  real-world  coordinates  depends  upon  some  control  points  whose  real- world 
coordinates  are  known.  To  correctly  locate  the  coordinates  of  these  control  points  in 
digitized  images  is  important  for  establishing  the  relationship.  So  far,  there  is  no  perfect 
method  which  can  automatically  and  correctly  identify  these  control  points  in  the  digitized 
images  except  those  control  points  with  some  particular  pattern. 

Finding  the  common  control  points  in  a  stereo  pair  still  needs  a  manual  operation. 
In  general,  the  control  points  showing  in  the  stereo  pair  are  often  triangulated  from 
photographic  measurements  such  that  the  time  and  money  spent  on  a  ground  survey  can  be 
avoided.  Sometimes  the  locations  for  control  points  are  selected  at  the  comers  of  buildings, 
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intersections  of  roads  and  so  on.  These  images,  without  some  particular  pattern,  are  difficult 
to  identify  by  existing  pattern  recognition  techniques. 

In  this  research,  the  control  points  were  derived  from  a  block  of  photos  by  employing 
aero-triangulation,  and  they  do  not  have  a  standard  pattern  that  could  be  used  to  easily 
identify  them  by  computer  processing  techniques.  Therefore,  a  manual  operation  was 
necessary  to  identify  the  locations  of  the  control  points  from  the  digitized  images.  Initially, 
a  sub-image,  whose  size  is  128  by  128,  with  a  control  point  at  the  center,  was  extracted  from 
the  left  image  of  the  given  stereo  pair  by  a  manual  operation  with  assistance  from  a  pointing 
device.  Then,  another  sub-image  with  the  same  control  point  at  its  center  was  extracted  from 
the  right  image  of  the  given  stereo  pair  by  the  same  procedure.  These  two  procedures  are 
repeated  several  times  to  make  sure  that  the  position  of  the  confrol  point  is  correctly  located. 
After  the  sub-images  are  exfracted  from  the  given  stereo  pair,  the  iterative  least-squares 
approach  is  applied  to  more  precisely  identify  the  relative  location  of  the  control  point  at 
different  sub-images,  respectively.  These  points  could  then  be  used  to  establish  relative  and 
absolute  orientation  of  the  stereopair.  Due  to  the  abundance  of  aerotriangulated  ground 
control  points,  these  points  could  serve  the  dual  purpose  of  relative  and  absolute  orientation. 
Mathematically,  a  minimum  of  five  control  points,  known  as  pass  points,  are  needed  to 
restore  the  relative  geometric  condition  that  exists  when  the  photos  were  taken.  Furthermore, 
six  or  more  pass  points  are  better  in  order  to  apply  the  least-squares  approach  to  increase  the 
reliability  of  the  orientation. 
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4.3-2  Transforming  Coordinates  from  the  Digitized  Image  into  the  Photo  Coordinate  System 


In  photogrammetry,  the  geometric  conditions  are  not  based  on  the  digital  image 
coordinate,  but  are  based  on  photo  coordinates.  Therefore,  the  collected  coordinates  from 
the  digitized  images  need  to  be  transformed  into  the  photo  coordinate  system  in  order  to 
apply  the  developed  geometric  conditions  between  photos  and  a  real-world  coordinates.  The 
2D  affine  fransform,  illustrated  in  Equation  4-7,  provides  a  way  to  transform  the  coordinates 
from  the  digitized  image  into  the  photo  coordinate  system. 
4.3-3  The  Relative  and  Absolute  Orientations  of  A  Given  Stereopair 

Usually,  at  least  six  control  points  existing  in  the  both  images  of  the  given  stereo  pair 
are  used  to  establish  the  stereomodel  parameters  for  relative  orientation.  The  stereomodel 
parameters  contain  the  rotations  about  x,  y,  and  z  axes,  and  the  relative  coordinates  of  the 
exposure  stations  that  existed  when  the  photo  was  taken.  There  are  only  two  lines  existing; 
namely,  one  line  is  from  the  exposure  station  of  the  left  image  of  the  given  stereo  pair, 
through  the  position  of  a  particular  object  showing  in  the  left  photo  of  the  given  stereo  pair, 
and  the  corresponding  position  in  the  object  space.  Another  line  is  from  the  exposure  station 
from  the  right  image  of  the  given  stereo  pair,  through  the  position  of  the  same  object 
showing  in  the  right  photo,  and  the  corresponding  position  in  the  object  space.  In  fact,  these 
two  lines  carmot  be  used  to  correctly  determine  a  object's  coordinates  in  object  space  from 
the  given  stereo  pair  unless  additional  constraints  conditions  are  provided.  In  order  to  find 
a  reliable  solution  for  stereomodel  parameters,  there  are  two  kinds  of  procedures  that  need 
to  be  performed.  The  first  procedure  is  to  reduce  the  numbers  of  unknown  parameters  in  the 
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stereomodel.  For  example,  the  rotations  about  x-axis,  y-axis,  and  z-axis,  and  the  coordinate 
of  the  exposure  station  of  the  left  image  are  set  to  be  zero  to  reduce  the  unknown  parameters 
of  the  stereomodel.  Then,  the  Equation  2-28  can  be  used  to  form  the  equations  for  the 
orientations  about  x-axis,  y-axis,  and  z-axis  (o),  (J),  K,  respectively  )  and  the  coordinates  of 
the  exposure  station  of  the  right  image  related  to  the  left  image  such  that  an  iterative  least- 
squares  approach  can  be  applied  to  find  the  most  reliable  solutions  for  the  stereomodel 
parameters.  The  whole  procedure  is  called  relative  orientation. 

Then,  the  second  procedure  is  employed  to  establish  the  relationship  between  the 
stereomodel  referenced  to  the  parameters  of  the  exposure  station  for  the  left  image  and  object 
coordinate  system.  This  procedure  are  called  absolute  orientation.  Absolute  orientation  is 
an  operation  to  orient  the  relative  stereomodel  to  fit  in  the  coordinate  system  of  the  real 
world.  Control  points  whose  real-world  coordinates  are  knovra  play  an  essential  role  in 
establishing  the  relationship  between  the  relative  stereomodel  and  real- world  coordinates. 

The  equations  can  all  be  put  together  to  apply  an  iterative  least-squares  approach  to 
find  the  solution.  This  procedure  is  called  bundle  adjustment.  Bundle  adjustment  is  an 
approach  to  adjust  the  photo-point  observations,  camera  parameters,  and  ground-  control- 
point  observations  together  by  employing  an  iterative  least-squares  approach.  The  whole 
procedure  is  represented  in  Equation  2-33  and  illustrated  in  Figure  4.3. 
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Figure  4.3  Procedure  to  determine  stereomodel  parameters  of  the  given  stereopair. 
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4.4  Stereo  Matching 

Stereo  matching  is  an  operation  to  locate  the  stereo  correspondences  from  a  given 
stereopair.  Since  1960,  many  theories  have  been  proposed  to  locate  the  stereo 
correspondences  from  a  given  stereo  pair.  In  fact,  how  to  efficiently  and  accurately  locate 
the  stereo  correspondences  from  a  given  stereo  pair  is  a  major  research  topic  in  the  fields  of 
computer  vision  and  photogrammetry  because  there  is  still  not  a  perfect  method  m  existence 
which  can  handle  the  stereo  matching  problem.  However,  the  proposed  theories  and 
approaches  can  be  roughly  divided  into  two  groups;  namely,  one  is  an  area-based 
appToach^™^'^^^^°™°^^'^\  and  another  one  is  a  feature-based  approadf^it^^"'**!!^'!  .  For  a 
photogrammetrist,  not  all  proposed  theories  can  be  applied  in  digital  photogrammetry 
because  of  the  properties  of  an  aerial  photo. 

A  stereo  pair  of  aerial  photos  poses  difficulty  in  stereo  matching  because  the  shape 
of  a  feature  can  be  distorted  from  one  photo  to  the  next.  An  aerial  photo  is  taken  over  a 
specific  area,  and  the  photo  is  formed  through  projection  geometry.  Sometimes,  the  shapes 
shown  in  the  given  stereo  pair  for  the  same  ground  object  will  be  different  because  of 
perspective  projection  geometry.  An  example  of  this  phenomenon  is  illustrated  in  Figure 
4.4.  Therefore,  at  some  particular  locations,  the  assumption  that  there  is  a  simple  one-to-one 
relationship  such  that  for  a  point  shown  on  the  left  image  of  a  given  stereo  pair,  there  is  one 
and  only  one  point  which  is  the  stereo  correspondence  shown  on  the  right  image,  caimot 
stand  unless  perspective  distortion  is  taken  into  account.  The  main  reason  is  that  when  the 
photo  was  taken,  the  viewing  position  has  been  changed  because  of  the  movement  of  the 
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Figure  4.4  (b).  The  sub-image  extracted  from  the  right  photo  contains 
a  stairway  with  projective  distortion. 
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airplane.  The  change  of  the  viewing  position  means  that  some  points  can  be  hidden  from 
the  new  viewing  position.  Therefore,  the  simple  one-to-one  relationship  does  not  apply. 

Furthermore,  the  parallax  displacements  are  not  uniform  through  the  given  stereo 
pair.  For  those  proposed  theories  using  a  window  with  fixed  size  to  calculate  the  correlation 
or  sum  of  squared  differences  (SSD)  for  stereo  matdang^*'^^^*^^^^^^^^^^^^^^^^^  it  is  difficult  to  find 
the  correct  stereo  correspondences  whose  parallax  displacement  is  large.  The  drawback  for 
employing  a  window  with  fixed  size  is  that  if  the  parallax  displacement  is  small  and  a 
window  with  large  size  is  employed,  then  the  processing  time  for  stereo  matching  will 
significantly  be  increased,  and  the  covered  area  is  so  large  such  that  the  positions  determined 
by  applying  maximum  correlation  coefficient  or  the  minimum  SSD  may  not  correctly 
represent  the  matched  result  due  to  different  projective  distortions  in  the  given  stereo  pair. 
On  the  other  hand,  if  the  parallax  displacement  is  large  and  a  window  with  small  size  is 
employed,  then  the  covered  area  is  so  small  such  that  the  estimated  positions  may  not  be 
correct''''^. 

Traditionally,  locating  the  stereo  correspondences  from  a  given  stereo  pair  has 
required  substantial  computation.  Therefore,  epipolar  geometry  was  exploited  to  reduce  the 
search  range  from  2D  to  ID  such  that  processing  time  can  be  significantly  reduced.  Epipolar 
geometry  is  illustrated  in  Figure  4.5.  Fundamentally,  for  any  point  shown  in  the  i-th  image, 
if  its  corresponding  object  coordinates,  (  Xp,Yp,  Zp  ),  are  approximately  known,  then  its 
corresponding  point  shown  in  the  j-th  image  can  be  roughly  calculated  by  employing 
coUinearity  equation.  In  doing  so,  the  search  range  int  the  right  image  will  be  reduced.  For 
many  positions,  the  ground  coordinates  are  hardly  known  and  cannot  be  estimated  though 
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epipolar  geometry  still  holds.  The  search  range  cannot  be  reduced. 

From  the  above  discussion,  a  developed  stereo  matching  system  needs  to  satisfy  the 
following  conditions: 

1.  The  system  must  provide  a  best  approximation  when  projective  distortions  occur, 

2.  The  system  needs  to  provide  a  flexible  search  range, 

3.  The  system  can  complete  the  stereo  matching  job  in  a  limited  time. 

In  stereo  matching,  it  may  happen  that  many  points  are  located  in  the  left  image  of  a  given 
stereo  pair,  but  there  is  one  and  only  one  corresponding  point  which  can  be  located  from  the 
right  image  because  the  one-to-one  image  similarity  relationship  does  not  stand  any  more. 
How  to  solve  this  problem  is  seldom  discussed  in  developed  stereo  matching 
systemst^'it''*J["]["l  In  this  research,  the  developed  system  has  the  capability  to  provide  a 
criterion  to  decide  which  point  shown  on  the  right  image  can  be  the  stereo  correspondence 
of  the  point  shown  on  the  right  image  by  comparing  the  similarity  of  DN  distributions. 

In  general,  at  the  beginning  of  stereo  matching,  it  is  not  known  where  the  stereo 
correspondences  of  a  given  stereo  pair  are.  Therefore,  for  a  particular  point  shown  on  the  left 
image,  the  location  of  its  stereo  correspondence  is  estimated  by  "guessing."  In  order  to 
reduce  computation  intensity,  a  window  with  a  fixed  size  is  applied  in  the  right  image  to 
locate  the  stereo  correspondence  by  a  series  of  procedures,  like  calculating  the  correlation 
coefficient.  For  some  cases,  this  window  with  the  fixed  size  works  well  to  locate  the 
corresponding  point  from  the  right  image  but  when  the  corresponding  point  is  not  in  the 
search  range  of  the  window,  the  stereo  matching  system  will  fail.  In  this  research,  the 
developed  system  has  the  capability  to  extend  the  size  of  the  window  such  that  no  matter 


72 

what  kinds  of  parallax  displacements  are,  they  will  fall  in  the  window. 

Furthermore,  the  third  requirement  for  the  stereo  matching  system  is  time  efficiency. 
The  processing  time  for  stereo  matching  is  an  important  factor  but,  so  far,  the  processing 
time  is  seldom  discussed  in  the  developed  system.  Even  though  the  stereo  matching  is  a 
highly  computation  intensive  job,  the  whole  procedure  still  needs  to  be  completed  in  a 
reasonable  time.  For  developing  a  system,  if  an  operator  can  complete  a  stereo  matching  job 
by  operating  a  stereo  plotter  in  a  hour,  it  is  of  questionable  value  to  develop  a  stereo 
matching  system  to  complete  the  same  matching  job  in  days.  The  developed  system  can 
complete  the  matching  job  in  a  reasonable  time  because  the  developed  system  contains 
different  criteria  such  that  the  search  and  processing  time  can  be  significantly  reduced. 
4.4-1  Introducing  the  Developed  System 

The  developed  system  contains  three  parts,  image  decomposition,  matching,  and 
redundancy  elimination.  The  system  is  illustrated  in  Figure  4.6.  Image  decomposition  is  to 
used  to  extend  the  original  images  into  a  series  of  images  with  different  scales.  These 
images  with  different  scales  contain  the  information  of  the  scene  with  different  degrees  of 
detail.  The  idea  of  this  system  is  to  extract  information,  like  edges,  and  find  the  stereo 
correspondences  from  the  left  and  right  image  at  the  coarsest  scale  because  at  this  scale,  only 
limited  information  needs  to  be  considered.  Then,  the  matched  results  from  the  images  at 
the  coarsest  scale  are  used  to  limit  the  search  range  on  the  images  at  a  finer  scale.  Then,  the 
matched  results  need  to  be  checked  to  avoid  the  phenomenon  that  the  same  matched  result 
shows  up  multiple  times.  This  procedure  is  executed  until  all  the  images  with  different 


scales  have  been  processed.  This  kind  multiresolutional  approach  has  become  a  tendency 
in  the  developing  the  stereo  matching  system  because  it  has  fast  processing  speed  and  those 
false  information  created  by  decompose  a  original  image  can  be  eliminated  by  employing 
the  approach  mentioned  above  to  increase  the  matching  accuracy. 

Section  4.4-2  introduces  the  geometric  constraint  condition  used  in  the  stereo 
matching;  Section  4.4-3  presents  the  criterion  to  measure  the  similarities  to  make  sure  that 
the  matched  results  are  correct;  Section  4.4-4  and  4.4-5  present  the  matching  procedures  at 
different  stages. 
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Figure  4.6  Process  flow  in  the  stereo  matching  system. 


4.4-2  Geometric  Constraint  Condition 

The  coUinearity  condition  equations  for  the  i-th  photo  and  the  j-th  photo  are 
developed  from  similar  triangles  from  Figure  4.5  as  follows: 
For  the  i-th  photo: 


w,  1 

(4-8) 


X    -  X        Y    -  Y        Z    -  Z  A 
pi         p        1         p        1  1 


For  the  j-th  photo: 


X    -  X       Y    -  Y       Z    -  Z  X. 
p        J         p        J        p       J  J 


(4-9) 


Originally,  the  photo  coordinate  system  is  not  parallel  to  the  ground  (  object  ) 
coordinate  system.  Therefore,  the  original  coordinate  system  needs  to  be  rotated  into  a 
parallel  coordinate  system,  like  the  Uj ,  V; ,  w,  coordinate  system  showing  in  Figure  4.5  such 
that  Equations  4-8  and  4-9  can  be  applied.  Equations  4-8  and  4-9  are  re-written  as  follows. 

Xp  -  X.  =  A..[m;,x.  +  mjiY;  +  m3',(-f)]  =  X.u. 

Yp  -  Y.  =  Ajm/^x.  +  m^Ui  +  m3;(-f)]  =  X,v  (4-10) 
Zp  -  Z.   =  Aj[m,3Xj  +  ra^^y.  +  m33(-f)]  =  X.w. 


and 
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Xp  -  Xj  =  A.[m/,x.  +  m^^y.  +  m^.^-f)]  =  X.u^ 

Yp  -  Yj  =  X.[ml,x.  +  m^i^Yj  +  mj^.C-f)]  =  X.v.  (4-11) 

Zp  -  Z.   =  X.[ml,x.  +  m2*3y.  +  m,^ -f)]  = 

Equaling  Equation  4-10  and  4-1 1  yields: 

Xp  =        +  X^  =  A^u.  +  X.  (4-12) 

Yp  -  K\  +  Y.  =  A.V.  +  Y.  (4-13) 

^    ;     Z    =  X.w.  +  Z.  =  X.w.  +  Z.  (4-14) 

p  I     1  1  J     J  j 

Solving  Equations  4-12  and  4-13  for  X.  and  A  yields: 

^  ^  (X.  -  X,)v.  -  (Y.  -  Y,)u^ 


V  U  -  M  V 
J    I  J  I 


and: 


V  u.  -  U  V 
J  I      J  > 


(4-15) 


(X.  -  X  )v   -  (Y.  -  Y  )m 
A   =  '-^  '      '   '  ^  (4-16) 


If  two  specific  points  are  found  to  be  in  stereo  correspondence,  then  the 
corresponding  ground  object  coordinates  can  be  calculated  by  employing  Equations  4-12  to 
4-14.  Here,  the  geometric  constraint  condition  used  to  calculate  ground  coordinate  is 
employed  as  a  criterion  to  identify  whether  the  located  points  are  the  stereo  correspondence 
pair.  Specifically,  two  Z  ground  coordinates  are  calculated  fi-om  Equation  4-14  by  used  each 
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of  the  values,  A.  ,  X.  from  Equations  4-15  and  4-16.  This  can  indicate  how  good  the 
matching  is  because  for  those  points  which  are  in  the  stereo  correspondence,  the  calculated 
Z  coordinates  from  different  photos  shall  be  the  same. 
4.4-3  Similarity  Constraint  Condition 

When  any  two  points  are  determined  to  be  the  stereo  correspondences  for  each  other, 
not  only  the  geometric  constraint  condition  but  also  the  distributions  of  DNs  of  these  two 
points  and  their  neighborhoods  should  be  similar.  In  general,  there  are  two  methods  which 
have  shown  good  performance  for  aerial  photos;  one  is  the  normalized  cross-covariance  ( 
NCV ),  and  another  one  is  called  normalized  cross-correlation  ( NCC  The  coefficient 
of  normalized  cross-correlation  is  calculated  by  employing  Equation  2-3.  Matching  is  based 
on  the  NCC,  whose  value  is  between  1  and  -1.  When  the  value  of  NCC  is  close  to  1,  the 
matched  result  has  high  reliability.  In  the  developed  system,  only  those  points  which  pass 
the  geometric  constraint  and  are  larger  than  the  established  threshold  are  regarded  as  matched 
points.  Otherwise,  the  point  is  classified  as  a  mismatching,  and  its  corresponding  terrain 
information  is  interpolated  from  its  neighbors. 
4.4-4  Matching  I  -  the  Stereo  Matching  at  the  Coarsest  Scale 

The  matching  algorithm  used  in  this  research  is  based  mainly  on  the  extracted 
features  from  the  image  pyramids  created  by  applying  LOG  or  Mallat's  wavelet.  Figure  4.8 
illustrates  the  basic  idea  employed  in  matching  at  the  coarsest  scale.  In  general,  when  the 
LOG  or  wavelet  is  used  to  decompose  the  original  images  into  a  series  of  images  with 
different  scales,  there  is  some  insignificant  information  in  the  image  at  the  coarsest  scale 
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The  image  of  the  fourth  level  (left) 


The  image  of  the  fourth  level  (right) 
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Figure  4.8  Matching  procedure  applied  to  the  images  with  the  coarsest  scale. 
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which  need  to  be  eliminated.  The  easiest  way  to  elimiate  the  uncessary  information  is  to 
establish  a  magnitude  threshold.  For  those  features,  if  their  DNs  are  less  than  the  pre- 
established  threshold,  they  will  be  eliminated.  Then,  some  feature  points  are  picked  from 
the  left  image  as  a  template  in  order  to  find  its  stereo  correspondence  from  the  right  image. 
Not  all  the  feature  points  showing  in  the  left  unage  will  be  located  in  the  right  due  to  limited 
time.  Hence,  the  number  of  feature  points  to  be  extracted  depends  upon  the  distribution  of 
the  feature  points.  For  example,  there  may  be  a  total  of  5000  feature  points  extracted  from 
the  left  image  but  there  are  1200,  1800,  1500,  and  500  feature  points  showing  in  the  first, 
second,  third,  and  fourth  quadrant  of  the  image,  respectively.  In  order  to  reduce  the 
computation  intensity,  there  is  a  limh  of  800  feature  points  which  will  be  used  as  templates 
for  stereo  matching.  These  will  be  proportioned  according  to  how  many  feature  points  can 
be  found  from  each  quadrant,  therefore  there  will  be  192, 288, 240,  and  80  feature  points  in 
the  first,  second,  third,  and  fourth  quadrants,  respectively.  Hoping  to  uniformly  distribute 
the  feature  points  throughout  the  entire  left  image,  there  will  be  a  14  by  14  pattern  feature 
points  uniformly  distributed  in  the  first  quadrant  of  the  left  image.  Similarly,  there  will  be 
17  by  17, 16  by  16,  and  9  by  9  feature  points  uniformly  distributed  in  the  second,  third,  and 
fourth  quarter,  respectively.  Sometimes,  because  of  the  uniform  distribution,  there  will  be 
no  feature  point  which  can  be  located  at  the  exact  position,  so  the  feature  point  whose 
distance  between  the  selected  points  is  minimum  will  be  chosen. 

After  the  feature  points  of  the  left  image  have  been  selected,  a  search  window  will 
be  applied  in  the  right  image  to  extract  the  corresponding  feature  points  by  comparing  the 
orientations  of  the  template  point  and  the  feature  points  in  the  search  window.  If  the 
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differences  of  the  orientations  are  less  than  a  pre-selected  threshold,  like  — ,  then,  the 

3 

geometric  constraint  condition  is  employed  to  find  n,  for  example,  n  =  1 0,  feature  points 
which  are  the  possible  candidates  for  the  stereo  correspondence.  If  there  is  no  feature  point 
which  can  satisfy  the  geometric  constraint  condition,  there  are  no  stereo  correspondence 
which  can  be  found  in  the  selected  search  window.  Now,  the  size  of  the  search  window  is 
increased,  and  the  orientation  condition  and  geometric  constraint  condition  are  employed 
again  in  order  to  find  some  possible  candidates  for  the  stereo  correspondence.  Only  limited 
possible  candidates  exist.  The  template  point  and  its  neighborhood  are  selected  firom  the  left 
image.  Similarly,  the  possible  candidates  and  their  neighborhoods  are  selected  in  order  to 
test  which  candidate  point  can  satisfy  the  similarity  constraint  condition.  If  there  are  no 
candidates  which  can  satisfy  the  similarity  constraint  condition,  the  size  of  the  selected 
search  window  is  increased.  For  those  candidates  passing  the  similarity  constrain  condition, 
the  one  with  the  maximum  correlation  coefficient  is  the  best  candidate  for  the  stereo 
correspondence.  Then,  the  position  of  this  best  candidate  is  retained  for  the  stereo  matching 
on  the  next  finer  level.  However,  in  some  cases,  no  feature  point  can  satisfy  the  orientation, 
geometric  constrain,  and  the  similarity  constraint  conditions.  Then,  the  size  of  the  search 
window  is  increased  till  the  maximum  window  size  is  reached;  for  example  200  by  200 
pixels. 

The  reason  to  employ  the  geometric  constraint  condition  before  the  similarity 
constraint  condition  is  to  reduce  the  processing  time.  Computing  the  geometric  constraint 
condition  is  very  straightforward  by  employing  Equations  4-12,  4-13,  4-14,  4-15,  and  4-16, 
while  a  great  deal  of  computation  is  required  in  employing  the  similarity  constraint  condition 
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when  the  specified  neighborhoods  are  large. 

Sometimes,  because  the  search  window  can  extend  its  size  such  that  same  matched 

result  can  be  repeated  several  times,  a  procedure  to  eliminate  the  repeated  results  is 

necessary.  Another  important  reason  for  eliminating  repeated  results  is  that  the  matched 

resuhs  will  be  potential  matched  candidates  at  the  next  finer  level.  If  the  repeated  results 

cannot  be  eliminated,  then  the  processing  time  for  matching  at  the  next  finer  level  will  be 

increased  because  of  the  number  of  the  repeated  results.  The  elimination  procedure  is  to  set 

up  a  counter  for  each  matched  resuh.  For  example,  for  those  matched  results  which  show 

up  the  first  time,  the  number  of  their  counter  is  1 .  For  those  matched  results  which  show 

up  the  second  time,  the  number  of  their  counter  is  2.  Only  those  counter  whose  value  is  1 

will  be  kept  in  the  matched  results  for  the  next  finer  level.  Otherwise,  the  matched  results 

will  be  removed  from  consideration. 

4.4-5  Matching  11  -  Stereo  Matching  at  A  Finer  Scale 

The  matching  procedure  applied  in  the  first,  second,  and  third  level  has  a  slight 
difference  with  the  matching  procedure  used  at  the  fourth  level.  From  unage  decomposition, 
each  level  represents  the  unages  with  a  specified  scale.  In  Figure  4.9,  the  procedure  applied 
to  the  images  with  the  finer  scales  does  not  have  a  sub-procedure  to  expand  the  size  of  search 
window  which  is  used  in  the  matching  procedure  applied  to  the  images  with  the  coarsest 
scale.  The  main  reason  is  that  the  matched  results  from  the  images  with  coarser  scale  are 
supposed  to  provide  good  approximations  for  the  positions  of  the  possible  matching 
candidates  in  the  images  with  the  finer  scales.  In  some  cases,  the  matched  results  at  the 
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coarser  scales  cannot  provide  good  approximations  of  the  possible  matching  candidates  at 
the  finer  scales  because  the  matched  results  at  the  coarser  scale  violate  the  causal  condition 
[i9][22]  yj^g  causal  condition  is  that  a  multiresolution  presentation  is  supposed  to  have  the 
property  that  no  spurious  information  is  generated  passing  fi-om  finer  to  a  coarser  scale.  For 
those  matched  results  violating  the  causal  condition,  they  can  be  eliminated  by  applying  the 
pyramid  matching  procedure. 

Similarly,  the  geometric  constraint  condition  and  similarity  constraint  condition  are 
employed  to  find  the  stereo  correspondences  in  the  fmer  levels.  Here,  the  size  of  the  search 
window  will  not  be  extended,  and  the  thresholds  set  for  the  geometric  constraint  condition 
and  similarity  constraint  condition  will  be  changed  at  each  level.  The  idea  of  changing  the 
threshold  of  the  geometric  constraint  condition  is  the  hope  that  at  each  level,  the  height 
difference  can  be  reduced,  and  finally  can  reach  the  acceptable  range.  As  for  changing  the 
threshold  of  the  similarity  constraint  condition,  at  each  finer  level,  the  information  is  not  as 
smooth  as  that  in  the  coarsest  level.  Hence,  the  smaller  threshold  is  necessary  when  the 
information  in  the  fmer  level  is  processed. 


The  image  of  the  fourth  kvel  (kft) 


The  image  of  the  fourth  level  (right) 


Matched  results  from  previous  level 


Yes 

Yes 

1  > 

Start  Matching 

<  1 

 > 

Figure  4.9  Matching  procedure  applied  to  images  with  the  finer  scales. 


4.5  Digital  Terrain  Model 
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After  the  stereo  matching  has  been  done,  the  ground  coordinates  used  to  describe  the 
surface  covering  the  specified  area  can  be  calculated  by  applying  Equations  4-12, 4-13,  and 
4-14.  As  for  those  places  where  elevations  are  not  available,  their  values  are  interpolated 
from  neighboring  Z.  Two  kinds  of  approaches  are  employed  to  create  a  series  of  grids  to 
describe  the  ground  surface.  One  is  to  limit  the  number  of  the  neighbors  which  surround  the 
interpolated  points.  Another  one  is  to  set  the  search  range  with  the  interpolated  point  as  the 
center.  Both  approaches  are  employed  to  eliminate  those  points  whose  interpolation  is 
insufficient  because  not  enough  data  point  were  used  to  interpolate.  With  enough  data, 
different  kinds  of  polynomial  or  spline  functions  can  be  employed  to  interpolate  the  point 
in  the  grid.  In  this  research,  the  grid  of  the  ground  surface  is  created  from  a  commercial 
package,  SURFER.  Using  SURFER,  the  method  of  the  inverse-distance  weighting,  was 
selected  for  data  interpolation. 


CHAPTER  5 
EXPERIMENT  RESULTS 


5.1  Introduction 


A  test  was  performed  to  investigate  the  feasibility  of  the  developed  system.  The  file 
sizes  of  the  original  stereo  pair  shown  in  the  Figures  3.2  and  3.2  are  large;  each  one  occupies 
almost  30  MG  hard-drive  space.  In  order  to  reduce  to  storage  space,  the  sub-images  are 
extracted  from  the  original  stereo  pair,  and  they  shown  in  Figures  5.1  and  5.2.  Each  sub- 
image  has  1024  columns  and  1024  rows.  Laplacian  and  Gaussian  filters  with  different  sizes 
and  Mallat's  wavelet  were  employed  to  decompose  the  original  images  into  a  series  of 
images  with  different  scales.  Then,  the  matching  procedure  mentioned  in  previous  chapter 
was  applied  to  locate  the  stereo  correspondences  of  the  images.  Finally,  the  digital  terrain 
model  in  the  region  covered  by  the  sub-image  was  created. 

In  this  chapter.  Section  5.2  shows  the  result  of  performing  interior  orientation  to  the 
stereo  pair;  Section  5.3  presents  the  image  coordinates  of  the  photo  control  points  shown  in 
the  images,  and  the  camera  parameters  of  the  photos  by  applying  bundle  adjustment;  Section 
5.4  shows  the  image  pyramids  formed  by  using  Laplacian  and  Gaussian  and  Mallat's 
wavelet  separately,  and  the  digital  terrain  model  created  by  applying  matching  results  fi-om 
image  pyramids;  Section  5.5  shows  the  results  created  by  manually  operating  the  plotter. 
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5.2  Experimental  Results  from  Interior  Orientation 
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In  this  research,  only  one  template  was  created  for  the  fiducial  marks  shown  at  the 

comers  of  the  given  photos.  Table  5-1  and  5-2  show  the  image  coordinates  of  the  fiducial 

marks  in  the  left  and  right  image  respectively. 

Table  5-1  Image  Coordinates  of  Fiducial  Marks 
from  the  Left  Image 


Points 

Rows 

Columns 

1 

5512.809 

5534.629 

2 

213.290 

5503.898 

3 

243.032 

203.996 

4 

5542.786 

236.455 

Table  5-2  Image  Coordinates  of  Fiducial  Marks 
from  the  Right  Image 


Point 

Rows 

Columns 

1 

5520.501 

5554.895 

2 

218.945 

5531.124 

3 

242.103 

231.238 

4 

5543.549 

255.102 
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After  the  image  coordinates  of  fiducial  marks  were  located,  the  relationships  between 
photos  and  images  were  established  by  applying  the  affme  transformation  shown  in  Equation 
4-7.  The  computed  parameters  for  the  2-D  affine  transformations  are  presented  in  Tables  5-3 
and  5-4. 

'     .  Table  5-3  Parameters  of  the  2D  Affme 
Transformation  for  the  Left  Image 


Parameter 

Value 

ai 

-0.039993 

«2 

-0.000230 

«3 

115.7956 

b, 

-0.000241 

bz 

0.039995 

bs 

-114.1253 

Table  5-4  Parameters  of  the  2D  Affme 
Transformation  for  the  Right  Image 


Parameter 

Value 

»! 

-0.039987 

»2 

-0.000170 

"3 

115.7251 

b, 

-0.000185 

b, 

0.0399970 

ba 

-115.2179 
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5.3  Experimental  Results  for  the  Photo  Control  Points  and  Camera  Parameters 

(      .    ,     .      .  ^  1    -  '     -      V   "  * 

Before  relative  orientation  and  absolute  orientation  were  computed,  the  coordinates 
of  the  control  points  had  to  be  located  first.  Figure  5.3  illustrates  their  positions  in  the  left 
and  right  images.  Table  5-5  shows  the  image  coordinates  of  the  photo  control  points. 
Table  5-5  Image  Coordinates  of  the  Photo  Control  Points 


Point 

Row 

Column 

Row 

Column 

48-3A 

1400.8 

3006.4 

1065.7 

787.0 

48-3B 

1629.1 

3008.3 

1309.7 

796.7 

49-3A 

920.0 

4795.0 

554.9 

2605.2 

49-3B 

1000.9 

4918.5 

645.8 

2734.8 

55-2A 

2433.5 

3398.9 

2146.5 

1204.6 

55-2B 

2709.1 

3588.6 

2429.4 

1403.7 

55-3A 

4784.2 

3469.1 

4417.9 

1334.7 

55-3B 

4824.7 

3468.3 

4453.9 

1333.7 

56-2A 

2189.6 

5577.1 

1915.8 

3385.5 

56-2B 

2309.9 

5669.5 

2043.6 

3471.1 

56-3A 

4497.8 

5489.3 

4183.8 

3248.7 

56-3B 

4529.8 

5539.5 

4214.1 

3297.1 

(Left  Image)  (  Right  Image) 
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(A)  Control  Point  48-3A  (L) 


(B)  Control  Point  48-3B  (L) 


(C)  Control  Point  49-3A  (L) 

Figure  5.3  The  photo  control 


_   jl 

1 

\ 

(A')  Control  Point  48-3  A  (R) 


(B')  Control  Point  48-3B  (R) 


(C)  Control  Point  49-3A  (R) 
shown  in  Figures  3.2  and  3.3. 
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(D)  Control  Point  49-3B  (L)  (D')  Control  Point  49-3B  (R) 


(E)  Control  Point  55-2A  (L)  (E')  Control  Point  55-2A  (R) 


(F)  Control  Point  55-2B  (L)  (F')  Control  Point  55-2B  (R) 


Figure  5.3  The  photo  control  points  shown  in  Figures  3.2  and  3.3  (continued). 


Figure  5.3  The  photo  control  points  shown  in  Figures  3.2  and  3.3  (continued). 
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(J)  Control  Point  56-2B  (L)  (J')  Control  Point  56-2B  (R) 


(K)  Control  Point  56-3A  (L)  (K')  Control  Point  56-3A  (R) 


(L)  Control  Point  56-3B  (L)  (L')  Control  Point  56-3B  (R) 


Figure  5.3  The  photo  control  points  shown  in  Figures  3.2  and  3.3  (continued). 


94 

After  the  positions  of  the  photo  control  points  in  the  image  coordinate  system  were 
located,  the  corresponding  positions  in  photo  coordinate  system  were  calculated  by  applying 
the  transforming  parameters  of  the  2-D  affme  transformation  shown  in  Tables  5-3  and  5-4. 
Table  5-6  gives  the  corresponding  photo  coordinates  for  the  photo  control  points. 


Table  5-6  Photo  Coordinates  of  the  Photo  Control  Points 


Point 

x(mm) 

y(mm) 

x(mm) 

y(mm) 

48-3A 

5.737 

59.121 

-83.979 

73.019 

48-3B 

5.758 

49.990 

-83.635 

63.258 

49-3A 

77.390 

11.91)1 

-11.202 

93.135 

49-3B 

82.454 

74.565 

-5.930 

89.387 

55-2A 

21.186 

17.770 

-67.473 

29.728 

55-2B 

28.706 

6.665 

-59.562 

18.383 

55-3A 

23.429 

-76.297 

-62.704 

-61.121 

55-3B 

23.388 

-77.915 

-62.734 

-62.560 

56-2A 

108.362 

26.985 

19.799 

38.583 

56-2B 

112.031 

22.151 

23.197 

33.460 

56-3A 

104.296 

-65.267 

13.833 

-52.122 

56-3B 

106.297 

-66.599 

15.837 

-53.306 

(Left  Image)  (Right  Image) 


Combining  the  photo  coordinates  of  the  photo  control  points  and  their  corresponding 
ground  coordinates  (  established  from  aerial-triangulation ),  the  camera  parameters  can  be 
calculated  by  applying  the  bundle  adjustment.  Table  5-7  shows  the  computed  results  for 
camera  parameters. 
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Table  5-7  Calculated  Camera  Parameters  by  Applying  the  Bundle  Adjustment 


Photo 


Omega"       Phi°      Kappa"     X^(M)      Y^(M)  Z^(M) 


Left 


2.0756 


0.3998 


0.0041    10063.908  9717.948  551.657 


Right 


-3.1097 


1.0509 


0.5838    10354.623  9726.253  551.322 


5.4  Image  Pyramids 


In  this  research,  two  kinds  of  image  pyramids  were  created;  one  was  created  by 
applying  the  Laplacian  and  Gaussian  filters  with  different  sizes,  and  the  other  was  created 
by  using  Mallat's  wavelet. 
5.4-1  LOG  Pyramids 

Now,  the  LOG  pyramid  is  introduced.  Figure  5.4  shows  the  image  pyramids 
creating  by  applying  Laplacian  and  Gaussian  filter  with  different  sizes,  9  by  9,  15  by  15,  21 
by  21,  and  33  by  33,  to  the  left  and  right  images. 

After  the  image  pyramids  haye  been  created  by  applying  LOG  filter  with  different 
sizes,  the  locations  of  zero-crossings  are  detected  at  the  each  level  of  the  image  pyramids. 
Some  magnitude  thresholds  were  established  to  eliminate  redundant  information.  Table  5-8 
shows  the  thresholds  and  the  average  processing  time  at  each  level  of  the  image  pyramids. 
Figure  5.5  shows  the  positions  of  zero-crossings  at  each  level  of  the  image  pyramids. 
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98 
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99 


100 


102 


103 
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Table  5-8  Thresholds  and  Average  Processing  Time  for  Running  LOG 
Filter  with  Different  Sizes,  9,  15,  21,  and  33 


Photo 

Filter  Size 

Threshold 

Average 
Processing 
Time  (Sec.)* 

left9.img 

9x9 

5.50 

442 

leftlS.img 

15x  15 

4.50 

921 

left21.img 

21  x21 

3.50 

■  1980 

Ieft33.img 

33x33 

2.50 

3934 

right9.img 

9x9 

5.25 

/  444 

rightlS.img 

15x  15 

4.25 

924 

right21.img 

21  x21 

3.25 

1806 

right33.img 

33x33 

2.25 

4037 

*  Average  Processing  Time  is  based  on  running  same  program  three 
times 


The  stereo  matching  scheme  mentioned  in  Chapter  4  was  then  executed,  and  the 
matched  results  were  interpolated  by  the  software,  SURFER.  The  interpolated  results  were 
used  to  create  3D  plots.  Figures  5.6  (A)  and  5.6  (B)  show  the  3D  plots  with  different 
viewing  angles.  The  parameters  of  the  3D  views  are  listed  in  Table  5-9. 


Table  5-9  Viewing  Parameters  for  the  3D  Plots 


Items 

View  Parameter 

Projection 

Orthographic 

Rotation  about  Z  axis 

225°, 45° 

Tilt  after  rotation 

30° 

Figure  5.6  (A)  The  3D  plot  with  noise  disturbances  and  a  45°  rotation  about  the  Z 
axis. 


Figure  5.6  (B)  The  3D  plot  with  noise  disturbances  and  a  225°  rotation  about  the 
Z  axis. 
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5.4-2  Wavelet  Pyramids 

The  wavelet  pyramids  are  created  by  applying  Mallat's  wavelet  method'^*'.  Figure 
5.7  shows  the  magnitude  information  of  the  image  pyramids  of  the  left  and  right  images  at 
the  different  scales,  2  "\  2  ^  2  ^  2".  Table  5-10  shows  the  thresholds  and  the  average 
processing  time  for  the  image  pyramids  at  each  scale.  Figure  5.8  shows  the  positions  of  the 
local  maxima  at  each  scale  level  of  the  image  pyramids  by  employing  the  thresholds 
presented  in  Table  5-10.  The  stereo  matching  scheme  mentioned  in  Chapter  4  was 
subsequently  executed,  and  the  matched  results  are  interpolated  by  the  software,  SURFER. 
Figure  5.9  (A)  and  (B)  shows  the  3D  plots  from  different  viewing  angles  which  are  rotated 
about  the  Z  axis. 


Table  5-10  Thresholds  and  Average  Processing  Time  for  Creating  Wavelet 
Pyramids. 


Photo 

Scale 

Threshol 
d 

Average  Processing 
Time(  Sec.)* 

leftl.img 

2-' 

24.50 

114 

Ieft2.img 

2-2 

14.50 

114 

leftB.img 

2-3 

15.50 

115 

left4.img 

2-^ 

27.50 

113 

rightl.img 

2-' 

24.0 

114 

rightZ.img 

2-2 

14.00 

114 

rightS.img 

15.00 

114 

right4.img 

2-4 

25.50 

114 

♦Average  processing  time  is  based  on  running  the  same  program  three 
times. 
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Figure  5.9  (B)  The  3D  plot  with  noise  disturbances  and  a  225°  rotation  about  the 
Z  axis. 
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5.5  Results  from  Manually  Operating  A  Stereo  Plotter 

Traditionally,  a  digital  terrain  model  was  created  by  operating  a  stereo  plotter.  The 
basic  procedures  for  operating  a  stereo  plotter,  include 

(1)  Inner  orientation, 

(2)  Relative  and  absolute  orientation, 

(3)  Finding  the  stereo  correspondences, 

(4)  Displaying  the  ground  coordinates  (X,  Y,  Z  ), 

(5)  Outputting  the  measured  results. 

V  In  this  research,  a  stereo  plotter,  the  Kern  DSR-14,  was  used  to  measure  the  photo 
coordinates  of  the  photo  control  points  and  to  create  the  digital  terrain  model  of  the  area 
shown  in  the  Figs  5.1  and  5.2.  The  basic  procedure  of  the  inner  orientation  is  to  measure 
the  coordinates  of  the  fiducial  marks  shown  in  the  Figure  3.1.  The  measured  coordinates  for 
the  fiducial  marks  shown  in  the  left  photo  and  right  photo  are  presented  in  Table  5-11. 

After  the  positions  for  the  fiducial  marks  have  been  located,  a  2D  affine 
transformation  was  established  to  transform  the  measured  coordinates  into  the  pre-calibrated 
photo  coordinate  system.  From  the  measured  and  pre-calibrated  coordinates,  the  coefficients 
of  the  2D  affine  transform  were  calculated  by  employing  a  least-squares  approach  such  that 
any  measured  coordinate  can  be  transformed  to  its  corresponding  calibrated  coordinate. 

As  for  the  relative  orientation  ,  the  photo  control  points  shown  in  the  photos  are 
located  by  viewing  simultaneously  the  same  photo  control  point  from  the  left  and  right 
images  through  the  viewing  device  of  the  stereo  plotter.  Table  5-12  shows  the  measured 
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coordinates  of  the  photo  control  points  from  the  left  photo  and  right  photo  shown  in  Figures 
3.2  and  3.3.  Then,  the  information  about  the  corresponding  ground  coordinates  of  the  photo 
control  points  is  input  into  the  stereo  plotter  such  that  a  true,  object  space,  stereo  model  is 
created. 

Combining  the  measured  photo  coordinates  of  the  photo  control  points  and  their 
corresponding  groimd  coordinates  (  from  aero-triangulation  ),  the  camera  parameters  are 
calculated  by  the  program  installed  in  the  stereo  plotter.  Table  5-13  shows  the  calculated 
camera  parameters. 


Table  5-1 1  Coordinates  for  the  Fiducial  Marks  Measured  by  Stereo  Plotter 


Point 

x(inm) 

y(min) 

x(mm) 

y(min) 

1 

4.930 

20.773 

7.668 

21.917 

2 

216.957 

21.662 

219.689 

23.226 

3 

215.930 

233.696 

218.130 

235.252 

4 

3.903 

232.806 

6.110 

233.946 

5 

0.418 

126.783 

2.885 

127.915 

6 

220.443 

127.704 

222.906 

129.272 

7 

109.899 

237.260 

112.096 

238.607 

8 

110.975 

17.213 

113.713 

18.564 

(Left  Image)  (Right  Image) 
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Table  5-12  Photo  Coordinates  of  the  Photo  Control  Points  Measured  by 
Stereo  Plotter 


Point 

•\(  tTim^ 

Al  111111  1 

TtlTtl^ 
A.I  111  111  1 

59  079 

-83  966 

72  981 

48-lB 

5  826 

49  979 

-83  546 

63  287 

49-lA 

11  416 

11  934 

-11  174 

11*1  /  ~ 

93087 

82  41 1 

74  584 

-5  934 

89  409 

17  Mf\ 

U  /  .*T  1  "T 

U.J  17  / 

1  8  376 

1  O.J  /  \J 

■;  55-3A 

23.453 

-76.342 

-62.690 

-61.142 

55-3B 

23.437 

-77.938 

-62.704 

-62.608 

56-2A 

108.398 

26.951 

19.813 

38.507 

56-2B 

112.039 

22.079 

23.225 

33.431 

56-3A 

104.240 

-65.353 

13.897 

-52.127 

56-3B 

106.388 

-66.609 

15.893 

-53.309 

(Left  Image)  (Right  Image) 


Table  5-13  Camera  Parameters  Determined  from  Stereo  Plotter  Measurements 


Photo 

Omega' 

Phi° 

Kappa° 

X^(M) 

Y^(M) 

Z^(M) 

Left 

2.1076 

0.5288 

-0.0166 

10065.018 

9717.794 

552.270 

Right 

-2.9887 

1.1640 

0.5648 

10355.675 

9725.187 

551.136 

Finally,  the  stereo  correspondences  were  located.  For  a  human  being,  this  is  done 
by^ viewing  the  same  point  simultaneously  in  the  left  photo  and  right  photo  through  the  stereo 
viewing  device  in  the  stereo  plotter,  and  the  operator  can  observe  that  the  floating  mark 
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shown  in  the  viewing  device  appears  to  be  on  the  ground.  Initially,  the  operator  sees  the 
floating  mark  above  or  below  the  ground  from  the  viewing  device  of  the  stereo  plotter,  and 
then,  adjusts  the  z-controUer  of  the  stereo  plotter  to  make  sure  the  floating  mark  is  on  the 
ground.  For  some  cases,  like  trees  or  buildings  which  will  block  the  view,  the  operator  can 
adjust  the  position  of  the  floating  mark  to  avoid  collecting  the  data  for  the  digital  terrain 
model  at  those  places  which  the  view  is  blocked.  Eventually,  the  operator  will  record  those 
points  that  he  or  she  feels  that  they  are  the  stereo  correspondences  from  the  given  stereo  pair. 
The  recorded  data  are  used  to  interpolate  the  elevations  of  those  points  without  any 
information  available.  For  example,  for  places  like  buildings  and  trees,  their  elevations  are 
interpolated  by  their  neighbors  whose  elevations  can  be  collected  from  the  stereo  plotter. 

The  software  which  controls  the  stereo  plotter  has  a  particular  fimction  to  collect  a 
digital  terrain  model.  An  operator  needs  to  decide  at  what  linear  interval  in  the  area  of 
interest  he  or  she  wants  to  collect  data  for  creating  digital  terrain  model.  The  intervals,  one 
in  X  direction  and  the  other  in  y  direction,  are  input  into  the  software.  When  the  operator 
starts  to  collect  the  data,  the  computer  sends  a  message  to  the  stereo  plotter.  Then,  the 
floating  mark  in  the  stereo  plotter  is  moved  to  the  particular  position  according  to  the  pre- 
defined intervals.  The  operator  will  collect  the  data  by  adjusting  the  elevation  of  the  floating 
mark  until  it  appears  to  rest  the  ground.  In  some  cases,  floating  mark  is  moved  to  the  place 
where  trees  or  buildings  are  located.  The  operator  must  then  to  decide  whether  he  or  she 
wants  to  change  the  x,  y  position  of  the  floating  mark  position  or  skip  the  point  altogether. 
For  this  research,  a  digital  terrain  model  was  collected  by  establishing  the  intervals  in  x 
direction  and  y  direction  to  be  5  meters.  The  resulting  data,  at  a  nominal  spacing  of  5  meters. 
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have  X,  y,  and  z  values  recorded  for  each  point.  These  collected  data  were  then  used  to 
interpolate  the  elevations  for  those  points  which  were  not  measured  by  the  operator.  The 
interpolated  results  are  shown  in  Figure  5.10.  Figures  5.10  (A)  and  (B)  show  the  3D  plots 
by  rotating  the  whole  terrain  model  about  the  Z  axis  at  different  angles  of  45*  and  225*, 
respectively. 


Figure  5.10  (A)  Digtial  terrain  model  collected  from  the  stereo  plotter  with  a  45° 
roation  about  the  Z. 


Figure  5.10  (B)  Digital  terrain  model  collected  from  the  stereo  plotter  with  a  225° 
rotation  about  the  Z  axis. 


CHAPTER  6 
EVALUATION  AND  DISCUSSION 


6.1  Introduction 

In  this  research,  the  results  calculated  from  the  developed  approach  were  compared 
with  results  measured  from  manually  operating  a  stereo  plotter.  In  the  proposed  approach, 
the  results  can  roughly  be  divided  into  two  parts;  one  part  contains  the  coordinates  of  the 
photo  confrol  points  and  camera  parameters,  and  another  part  is  the  digital  terrain  model. 
The  differences  between  the  calculated  coordinates  and  measured  coordinates  are  presented 
and  evaluated.  The  digital  terrain  models  created  by  applying  the  proposed  approach  are 
compared  with  the  digital  terrain  model  created  by  manually  operating  the  stereo  plotter.  In 
the  proposed  approach,  two  kinds  of  image  pyramids  were  created;  namely,  one  was  created 
by  employing  Laplacian  and  Gaussian  filters  with  different  sizes,  and  the  other  was  created 
by  applying  Mallat's  wavelet.  In  order  to  evaluate  the  created  digital  terrain  models  by 
employing  different  image  pyramids,  a  digital  terrain  model  was  created  by  maniially 
operating  the  Kern  DSR-14  stereo  plotter.  The  comparisons  between  different  digital  terrain 
models  are  evaluated  and  discussed  in  this  chapter. 

In  this  chapter.  Section  6.2  compares  the  differences  between  the  measured  and 
computed  results,  including  the  positions  of  the  photo  control  points;  Section  6.3  shows  the 
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differences  between  different  digital  terrain  models  in  quantity;  Section  6.4  evaluates  the 
developed  system's  C  factor. 

6.2  The  Differences  between  Measured  and  Computed  Results 

In  this  section,  the  measured  x,y  coordinates  of  the  photo  control  points  shown  in 
each  photo  of  the  given  stereo  pair  are  compared  with  the  calculated  coordinates  obtained 
by  the  approach  developed  in  this  research.  When  using  a  stereo  plotter,  the  stereo 
'  correspondences  are  located  by  the  floating  dot.  The  floating  dot  is  seen  when  the  eyes  fuse 
two  separate  dots  together,  one  which  is  projected  on  the  left  photo  and  another  on  the  right 
photo.  In  a  stereo  plotter,  the  operator's  left  eye  sees  only  images  from  the  left  photo  along 
with  the  corresponding  left  dot  and  the  operator's  right  eye  simultaneously  sees  only  images 
and  corresponding  dot  on  the  right  photo.  When  operating  the  stereo  plotter,  the  operator  can 
see  the  floating  dot  moving  around  the  3D  model  in  a  stereo  plotter.  The  floating  dot  thus 
serve  as  an  index  mark  used  to  measure  X,Y,Z  coordinates  of  points  of  interest.  In  other 
words,  its  function  is  similar  to  the  function  of  a  cursor  in  a  tablet  digitizer.  The  difference 
between  a  stereo  plotter  and  digitizer  is  that  the  operator  can  see  the  stereo  correspondences 
from  the  stereo  plotter  in  three  dimensions  while  on  the  digitizer,  the  operator  can  only  see 
the  plane  features  without  stereo  correspondences.  The  size  of  the  floating  dot  can  be 
adjusted  when  the  operator  feels  it  is  too  large  or  small.  However,  even  at  the  smallest 
setting  the  floating  dot  still  occupies  several  corresponding  pixels  in  the  digitized  photos. 
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From  Section  5.5,  before  calculating  the  camera  parameters,  the  fiducial  marks  of 
both  photos  and  the  photo  control  points  need  to  be  located  in  the  given  stereo  pair.  The 
positions  of  fiducial  marks  and  photo  control  points  are  used  to  establish  the  interior 
orientations  and  relative  orientation  of  the  given  stereo  pair.  After  the  interior  and  relative 
orientations  are  done,  the  stereo  plotter  can  re-create  the  same  geometric  condition  which 
existed  when  the  photos  were  taken.  Similarly,  the  developed  approach  has  the  capability 
to  automatically  find  the  positions  of  fiducial  marks  in  the  photos  and  semi-automatically 
locate  the  positions  of  the  photo  control  points  to  sub-pixel  accuracy.  Tables  6-1  and  6-2 
show  the  difference  between  the  measured  (  from  stereoplotter  )  and  calculated  (  from 
developed  approach  )  coordinates  of  the  photo  control  points  from  the  left  and  right  photo, 
respectively. 


Table  6-1  Differences  between  Measured  and  Calculated  Coordinates  from  the 
Left  Photo.  ( All  values  in  mm  ) 


Point 

x'^ 

AX 

Ay 

48-3A 

5.726 

5.737 

-0.051 

59.079 

59.081 

-0.002 

48-3B 

5.826 

5.798 

0.028 

49.979 

49.990 

-0.011 

49-3A 

77.416 

77.430 

-0.014 

77.934 

77.897 

0.037 

49-3B 

82.411 

82.347 

0.064 

74.584 

74.634 

-0.050 

55-2A 

21.225 

21.226 

-0.001 

17.696 

17.690 

0.006 

55-2B 

28.699 

28.746 

-0.047 

6.597 

6.625 

-0.028 

55-3A 

23.453 

23.469 

-0.016 

-76.342 

-76.337 

-0.005 

55-3B 

108.398 

23.427 

0.010 

-77.938 

-77.955 

0.017 

56-2A 

112.039 

108.402 

-0.004 

26.951 

26.944 

0.007 

M  =  Measured,  C  =  Calculated. 


125 


Table  6-1  Differences  between  Measured  and  Calculated  Coordinates  from  the 
Left  Photo.  (Continued )  ( All  values  in  mm  ) 


Point 

AX 

Ay 

56-2B 

112.039 

112.070 

-0.031 

22.079 

22.111 

-0.032 

56-3A 

104.240 

104.336 

-0.096 

-65.353 

-65.347 

-0.006 

56-3B 

106.388 

106.336 

0.052 

-66.609 

-66.639 

0.030 

M  =  Measured,  C  =  Calculated. 


Table  6-2  Differences  between  Measured  and  Calculated  Coordinates  from  the 
Right  Photo.  ( All  values  in  mm ) 


Point 


AX 


,M 


Ay 


48-3A 

48-  3B 

49-  3A 
49-3B 
55-2A 
55-2B 
55-3A 

55-  3B 

56-  2A 
56-2B 
56-3A 
56-3B 


-83.966 
-83.546 
-11.174 
-5.934 
-67.414 
-59.529 


19.813 
23.225 
13.897 


-83.939 
-83.596 
-11.122 
-5.955 
-67.433 
-59.522 


-62.690  -62.664 
-62.704  -62.694 


19.839 
23.237 
13.949 


15.893  15.876 


-0.027 
0.050 
-0.052 
0.021 
0.019 
-0.007 
-0.026 
-0.010 
-0.026 
-0.012 
-0.052 
0.017 


72.981 
63.287 
93.087 
89.409 
29.682 
18.376 
-61.142 
-62.608 
38.507 
33.431 
-52.127 
-53.309 


72.978 
63.218 
93.094 
89.439 
29.687 
18.343 
-61.161 
-62.599 
38.542 
33.420 
-52.127 
-53.346 


0.003 
0.069 
-0.007 
-0.030 
-0.005 
0.033 
0.019 
-0.009 
-0.035 
0.011 
0.000 
0.037 


M  =  Measured,  C  =  Calculated. 


Based  on  the  results  between  measured  and  calculated  coordinates  shown  in  Tables 
6-1  and  6-2,  the  mean  and  RMS  differences  were  calculated  and  are  presented  in  Table  6-3. 
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Table  6-3  Summary  of  Differences  between  the  Measured  and  Calculated 
Coordinates.  (  All  values  in  mm) 


Photo 

Mean  ax 

Mean  Ay 

RMS  AX 

RMS  Ay 

Left 

-0.009 

-0.003 

0.037 

0.024 

Right 

-0.009 

0.007 

0.030 

0.028 

Given  the  pixel  size  of  0.04  mm,  the  means  and  RMS  for  the  measured  and  calculated 
coordinates  are  within  a  pixel  range  in  the  digital  domain.  For  the  worst  case  in  the  left 
image,  there  are  2.40  and  1 .25  pixel  differences  in  the  x  and  y  directions,  respectively.  As 
for  the  worst  case  in  the  right  image,  there  are  1 .30  and  1 .73  pixel  differences  in  the  x  and 
y  directions,  respectively. 

As  for  the  camera  parameters  of  the  given  photos,  the  difference  between  the 
developed  system  (  shown  in  Table  5-7  )  and  stereoplotter  (  shown  in  Table  5-13  )  is 
summarized  in  Table  6-4. 


Table  6-4  Summary  the  Difference  of  the  Camera  Parameters  Determined  from 
Developed  Approach  and  Stereoplotter  Measurements 


Photo 

Omega® 

Phi" 

Kappa° 

X^(M) 

Y^CM) 

Z^(M) 

Left 

-0.0320 

-0.1290 

0.0207 

-1.110 

0.154 

-1.043 

Right 

-0.1210 

-0.1131 

0.0190 

-1.052 

1.066 

0.186 
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6.3  Comparisons  between  Different  Digital  Terrain  Models 

In  this  section,  three  kinds  of  digital  terrain  models  are  compared.  In  this  research, 
'  two  kinds  of  image  pyramids,  Laplacian  and  Gaussian  image  pyramid  and  Mallat's  wavelet 
image  pyramid,  have  been  employed  to  decompose  the  given  stereo  pair  into  images  at 
different  scales.  Two  digital  terrain  models  are  created  by  employing  two  kinds  of  image 
pyramids  with  the  developed  approach.  The  third  one  was  created  by  manually  operating 
the  stereo  plotter. 

By  definition,  a  digital  terrain  model  does  not  contain  the  information  about 
buildings,  trees,  cars,  ponds,  and  bushes.  When  a  stereo  plotter  is  used  to  collect  data  for  a 
digital  terrain  model,  an  operator  can  adjust  the  pointing-device  in  the  stereo  plotter  to 
change  or  skip  those  places  where  are  occupied  by  buildings,  trees,  cars,  ponds,  and  bushes 
such  that  the  digital  terrain  model  created  by  operating  the  stereo  plotter  does  not  contain  the 
information  for  those  places.  However,  in  the  developed  approach,  the  computer  did  not 
avoid  those  places  which  are  occupied  by  buildings,  trees,  and  bushes  when  they  were 
encountered  during  processing.  Therefore,  some  post-processing  procedures  were  employed 
to  process  the  collected  data  in  order  to  compare  the  digital  terrain  models  created  by 
different  approaches.  From  Figures  5.1  and  5.2,  the  ranges  of  the  those  places,  which  are 
occupied  by  building  and  bushes,  are  measured  such  that,  for  those  collected  data  falling  in 
the  measured  ranges,  they  can  be  eliminated.  The  rest  of  the  collected  data  was  then  used 
to  numerically  describe  the  surface  illustrated  in  Figures.  5.1  and  5.2.  In  the  post-processing 
procedure,  another  purpose  is  to  eliminate  those  collected  data  point  which  were  not  properly 
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matched  in  stereo.  In  fact,  no  matter  what  method  has  been  proposed  to  solve  the  stereo- 
matching  problem,  there  is  no  existing  approach  which  can  claim  error-free  matching.  After 
the  matched  results  were  used  to  create  the  grid  data  for  the  area  covered  by  the  photos,  if 
the  differences  between  the  elevation  of  a  particular  grid  point  and  the  elevations  of  its 
corresponding  neighbors  are  larger  than  a  pre-defined  threshold,  the  point  is  considered  as 
an  error.  Then,  from  this  particular  point  it  was  traced  back  to  matched  results  and  to  find 
out  that  which  matched  result  causes  this  problem  and  it  is  eliminated  from  the  collected 
data.  In  doing  so,  Figures.  6.1  (A)  and  (B)  show  the  digital  terrain  model  created  by  the 
developed  approach  with  Laplacian  and  Gaussian  pyramid  and  rotations  about  the  Z  axis  of 
45°  and  225°,  respectively.  Similarly,  Figures.  6.2  (A)  and  (B)  show  the  digital  terrain 
model  created  by  the  developed  approach  with  Mallat's  wavelet  and  rotations  about  the  Z 
axis  of  45°  and  225°,  respectively. 

From  Figures  5.10,  6.1,  and  6.2,  three  digital  terrain  models  have  been  built. 
Basically,  the  digital  terrain  model  created  by  operating  the  stereo  plotter  is  used  to 
separately  compare  the  other  two  digital  terrain  models.  The  comparison  is  made  by 
comparing  the  elevation  difference  at  each  grid  from  different  digital  terrain  models  and 
areas  ( including  the  whole  area  covered  by  the  photos  and  two  sub-area  shown  m  Figure  6.3 
).  The  results  of  comparing  the  digital  terrain  model  created  by  the  developed  approach  with 
the  Laplacian  and  Gaussian  pyramid  with  the  digital  terrain  model  created  by  manually 
operating  the  stereo  plotter  is  shown  in  Figures.  6.4  (A)  and  (B).  Similarly,  the  results  of 
comparing  two  digital  terrain  models  ( one  created  by  the  developed  approach  with  Mallat's 
wavelet  pyramid  and  another  one  created  by  manually  operating  the  stereo  plotter )  is 
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presented  in  Figures  6.5  (A)  and  (B). 

In  Figures  6.4  (A)  and  (B),  eacli  figure  contains  three  plots;  the  plot  ( a )  and  ( a' ) 
represent  the  elevation  difference  of  the  whole  area  from  the  different  digital  terrain  models, 
the  plot  ( b )  and  ( b' ),  and  ( c )  and  ( c' )  show  the  elevation  differences  of  the  local  areas 
from  the  different  digital  terrain  models.  From  Figures  6.4  (  a )  and  (  a' ),  the  mean  of  the 
elevation  differences  for  the  whole  area  is  -0.206  (M)  and  the  RMS  of  the  elevation 
differences  of  this  area  is  0.509  (M).  Form  Figure  6.4  (  b  )  and  (  b' ),  the  digital  terrain 
model  is  described  much  more  smoothly.  The  mean  of  the  elevation  differences  for  the  area 
shown  in  the  Figure  6.3  is  -0.080  (M),  and  the  RMS  of  the  elevation  differences  for  this  area 
is  0.272  (M).  Furthermore,  from  Figure  6.4  (  c  )  and  (  c'  ),  there  are  higher  elevation 
differences  at  some  places  in  the  digital  terrain  model.  The  mean  of  the  elevation 
differences  for  the  area  shown  in  Figure  6.4  (A)  is  -0.161  (M)  and  the  RMS  of  the  elevation 
differences  of  this  area  is  0.299  (M).  For  this  area,  those  places  where  there  are  large 
elevation  differences  are  covered  by  the  bushes  and  trees.  The  results  of  the  elevation 
differences  shown  in  the  plots  (  a ),  (  a' ),  (  b  ),  ( b' ),  (  c  ) ,  and  (  c' )  are  summarized  in 
Table  6-5. 


Table  6-5  Summary  of  the  Results  of  the  Elevation  Differences  from  the  Plots  Shown 
in  the  Figures  6.4  ( A  )  and  (  B  ). 


Plots 

Mean  of  Elevation 

RMS  of  Elevation 

Difference  (M) 

Difference  (M) 

(a)&(a') 

-0.206 

0.509 

(b)&(b') 

-0.080 

0.272 
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Table  6-5  Summary  of  the  Results  of  the  Elevation  Differences  from  the  Plots  Shown 
in  the  Figures  6.4  ( A )  and  (  B  ).  (Continued) 


Plots 

Mean  of  Elevation 

RMS  of  Elevation 

Difference  (M) 

Difference  (M) 

(c)&(c') 

-0.161 

0.299 

Similarly,  in  Figure  6.5  ( A )  and  ( B ),  the  elevation  differences  between  the  digital 
terrain  model  created  by  the  developed  approach  with  Mallat's  wavelet  pyramid  and  the 
digital  terrain  model  created  by  manually  operating  the  stereo  plotter  are  presented.  Each 
figure,  Figure  6.5  ( A )  or  ( B ),  contains  three  plots;  plots  ( a )  and  ( a' )  show  the  elevation 
difference  of  the  whole  area  from  the  different  digital  terrain  models,  the  plots  ( b )  and  ( b'), 
and  ( c )  and  ( c' )  represent  the  elevation  differences  of  the  local  areas  ( shown  in  Figure  6.3 
)  from  different  the  different  digital  terrain  models. 

From  Figure  6.5  ( a )  and  (  a' ),  the  mean  of  the  elevation  differences  for  the  whole 
area  is  -0. 199  (M)  and  the  RMS  of  the  elevation  differences  for  this  area  is  0.452  (M).  From 
Figure  6.5  ( b )  and  ( b' ),  the  area  is  extracted  from  the  whole  area  shown  in  Figure  6.5  ( a 
)  and  ( a' ).  The  mean  of  the  elevation  difference  for  this  area  is  -0.094  (M)  and  the  RMS  of 
elevation  differences  is  0.300  (M).  In  Figure  6.5  ( c )  and  ( c' ),  a  smaller  area,  compared 
with  Figure  6.5  ( b )  and  ( b' ),  is  extracted  from  the  whole  area.  The  mean  of  the  elevation 
difference  for  this  area  is  -0.332  (M)  and  the  RMS  of  elevation  difference  is  0.309  (M).  The 
results  of  the  elevation  differences  shown  in  the  plots  ( a ),  ( a' ),  ( b ),  ( b'),  ( c ),  and  ( c' ) 
are  summarized  in  the  Table  6-6. 
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Table  6-6  Summary  of  the  Results  of  the  Elevation  Differences  from  the  Plots  Shown 
in  the  Figures  6.5  ( A )  and  ( B  ). 


Plots 

Mean  of  Elevation 

RMS  of  Elevation 

Difference  ( M ) 

Difference  ( M ) 

(a)&(a') 

-0.199 

0.452 

(b)i&(b') 

-0.094 

0.300 

(c)&(c') 

-0.332  ■ 

0.309 

6.4  Evaluating  the  Developed  System's  C  Factor 

Relative  vertical  accuracy  capabilities  of  different  stereoscopic  plotters  are  commonly 
compared  on  their  C  factors.  The  C  factor  is  the  ratio  of  the  flying  height  above  the  ground 
of  a  photo  to  the  contour  interval  reliably  plotted  by  using  that  photo,  and  is  represented  as 
follows: 

C  factor  =  (6-1) 
C.I. 

where  H  represents  the  flying  height  above  ground  of  the  photo  and  C.I.  represents  the 
contour  interval''*'.  In  general,  manufacturers  commonly  specify  C  factors  for  their 
instruments.  For  example,  the  C  factors  for  analytical  stereoplotters  are  between  2000  and 
2500. 

The  C  factor  for  the  developed  system  is  approximated  as  follows.  The  relationship 
of  the  RMS  of  the  photo  coordinates  and  the  resulting  accuracy  in  the  object  space  coordinate 
system  is 
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=  v^-cot(0)o  (6-2) 

s 


where  S  is  average  photo  scale;  6  is  the  overlap  angle;  o  is  the  RMS  of  the  photo 
coordinates'"'.  In  this  research,  the  difference  of  the  photo  coordinate  is 

Ay  =  y  ^  -  y 

where  M  indicates  that  the  coordinate,  (  x,  y ),  was  measured  from  stereo  plotter.  Kern- 14, 
and  C  indicates  that  the  coordinate,  (  x,  y  ),  was  calculated  from  the  developed  approach. 
From  error  propagation,  the  RMS  of  the  difference  of  the  photo  coordinate  is  represented  as 
follows: 


2  2  2 

Ax         .M  ,c 


Table  6-3  provides  o^^  =  ±0.037mmand  o^^^  =  ±0.028  mm.  The  stereoplotter,  Kem-14, 
has  a  ±0.001  mm  precision.  Then,  o^c  ^  ±0.037  mm  and  o^c  -  ±0.028mm.  The  RMS, 
o ,  of  the  photo  coordinates  calculated  by  the  developed  approach  is  ±  0.046  mm.  The 
is  calcualted  by  applying  Equation  6-2,  and  its  value  is  ±  0.32  M  if  the  photo  scale  is  1 :3000 
and  60  percent  overlap  is  adopted.  The  C  factor  for  the  developed  system  is  1600. 

If  the  elevation  difference,  AH ,  between  different  digital  terrain  models  is  considered 
and  represented  as  follows: 

AH  =  H„  -  H,  (6-5) 
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where  H^^  represents  the  height  of  each  grid  which  was  interpolated  by  data  collected  from 
manually  operating  the  stereoplotter,  Kern- 14,  and  represents  the  height  of  each  grid 
which  was  interpolated  by  data  collected  by  applying  the  developed  approach.  From  error 
propagation,  the  RMS  of  AH  is  calculated  as  follows. 


^Ih  =        +  (6-6) 

Tables  6-5  and  6-6  provide  a^^  ^  ±  0.50  (M)  and  a^^  ^  ±  0.30  (M)  for  the  whole  area 
and  sub-areas,  respectively.  The  C  factor  for  the  stereoplotter,  Kern- 14,  is  roughly  2000  and 
its  RMS,  ,  for  the  elevation  measurement  is  ±  0.25  (M)  if  the  flying  height  above 
ground  is  500  (M).  By  using  Equation  6-4,  the  values  of  a^^  are  ±  0.43  (M)  and  ±0.16 
(M)  for  the  whole  area  and  sub-areas,  respectively.  Then,  the  C  factor  for  the  whole  area  is 
1200  and  the  C  factor  for  the  sub-areas  is  3100. 

From  above  discussion,  it  was  discovered  that  for  the  whole  area,  the  C  factor  of  the 
developed  systems  is  around  1200  because  the  west-south  part  of  the  whole  area  is  covered 
with  a  lots  of  bushes  and  trees.  As  known,  it  is  very  difficult  to  collect  data  form  those  areas 
covered  with  trees  and  bushes  in  photogrammetry.  However,  the  developed  system  still 
cannot  solve  this  difficulty.  As  for  the  sub-areas,  the  C  factor  is  around  3100  because  the 
sub-areas  are  not  covered  with  many  trees  and  bushes.  Comparing  both  situations,  the  C 
factor  of  the  developed  system  for  the  test  areas  is  around  2000  ( by  taking  average  of  1 600, 
1200  and  3100  ),  and  is  very  close  to  the  C  factors  of  other  analytical  stereoplotters. 


CHAPTER  7 
CONCLUSIONS  AND  RECOMMENDATIONS 


*  >    . t'  •  -  ' 

7.1  Introduction 

This  research  of  generating  DTM  from  digitized  aerial  photos  of  a  complex  scene  by 
employing  pattern  recognition  with  the  Fourier  transform  and  multiresolutional  feature-based 
stereo  matching  has  led  to  several  conclusions.  These  conclusions  are  shown  in  this  chapter. 
However,  there  are  some  weaknesses  presented  by  applying  the  developed  approach.  The 
recommendations  focusing  on  these  weaknesses  are  made  and  presented  in  this  chapter. 

In  this  chapter,  Section  7.2  presents  the  conclusions  and  weaknesses  relevant  to  this 
research;  Section  7.3  proposes  the  recommendations  drawn  from  this  research. 

7.2  Conclusions  and  Weaknesses 

The  digital  terrain  models  generated  by  the  developed  approach  with  Laplacian  and 
Gaussian  and  Mallat's  wavelet  image  pyramids  are  similar.  Table  6-5  and  6-6  illustrate  that 
the  digital  terrain  models  created  by  the  developed  approach  with  Laplacian  and  Gaussian 
and  Mallat's  wavelet  image  pyramids  are  similar  in  the  test  areas  when  those  digital  terrain 
models  were  compared  with  the  digital  terrain  model  created  by  manually  operating  the 
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stereoplotter.  There  are  only  0.007  (M)  and  0.047  (M)  differences  by  comparing  the  means 
and  RMS  of  the  elevation  differences  for  the  whole  test  area  from  two  different  approaches, 
respectively. 

With  a  pre-built  template,  employing  pattern  recognition  with  the  Fourier  transform 
can  quickly  and  correctly  locate  its  position  from  a  scene  containing  the  template.  This 
method  has  several  advantages.  First,  it  can  correctly  locate  the  position  of  a  pre-built 
template  from  a  scene  containing  the  template  surrounded  with  a  complicated  background 
because  the  background  effects  have  been  considered.  Second,  the  template's  position  in  the 
template  image  ( shown  in  Appendix  B )  is  not  important  because  position's  shift  effects  has 
been  eliminated  when  the  phase  filter  is  created.  Third,  pattern  recognition  with  the  Fourier 
transform  is  clearly  and  well  explained  in  mathematics.  The  mathematical  explanation  is 
presented  in  Appendix  B. 

The  processing  time  of  using  Mallat's  wavelet  to  decompose  an  image  into  a  series 
of  images  with  different  scales  is  much  faster  than  that  of  using  Laplacian  and  Gaussian 
filters  with  different  sizes  to  decompose  the  image  into  a  series  of  images  with  different 
scales.  From  Table  5-8  and  Table  5-10,  when  a  Laplacian  and  Gaussian  filter  with  large  size 
was  used  to  decompose  an  original  image,  the  processing  speed  is  roughly  20  times  slower 
than  that  of  using  Mallat's  wavelet. 

The  stereo  matching  scheme  used  in  this  research  is  efficient.  The  multiresolutional 
feature-based  approach  for  stereo  matching  can  eliminate  those  false  matching  occurring  at 
the  coarser  scales,  and  its  processing  time  is  much  faster  than  that  of  using  area-based 
approach  for  stereo  matching.   The  proposed  approach  at  the  coarsest  scale  requires 
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significant  computation  because  it  involves  an  approach  to  extend  the  size  of  search  area  ( 
known  as  adapted  approach  )  in  order  to  deal  with  the  situation  when  a  large  parallax 
appears.  This  processing  time  takes  roughly  one-third  of  the  total  processing  time.  After  the 
matched  results  were  located  from  the  images  with  the  coarsest  scale,  the  processing  speeds 
for  image  with  finer  scales  are  fast  because  the  matched  results  at  the  coarsest  scale  provide 
good  approximations  for  the  images  with  finer  scales. 

However,  the  C  factor  approximated  from  previous  chapter  cannot  really  present  the 
developed  system's  C  factor  because  only  few  test  areas  and  images  have  been  investigated 
in  this  research.  It  is  difficult  to  make  a  conclusion  about  developed  system's  C  factor 
according  to  the  limited  statistical  significance  studied  in  this  research. 

There  are  some  difficulties  when  pattern  recognition  with  the  Fourier  fransform  is 
applied  to  locate  the  position  of  a  template.  For  example,  when  a  template  shown  in  a  scene 
is  rotated,  the  method  fails  to  locate  the  template's  position  from  the  scene.  An  example  is 
given  in  Appendix  B.  Another  difficulty  is  that  when  a  scene  contains  a  template  whose 
scales  in  x  and  y  directions  have  been  changed,  the  method  fails  to  locate  the  template's 
position  from  the  scene.  According  to  personal  study,  it  was  determined  that  when  the  scales 
in  X  and  y  directions  were  changed  from  1 .0  to  0.9,  the  method  still  can  correctly  locate  the 
template's  position  from  the  scene.  If  the  scales  in  x  and  y  directions  were  changed  to  0.8 
or  smaller,  the  template's  position  begins  to  move  away  from  its  correct  position. 
Furthermore,  if  a  scene  contains  a  template  whose  scales  in  x  and  y  directions  have  been 
changed  and  it  has  been  rotated ,  the  method  fails  to  locate  the  template's  position  from  the 
scene.  Recent  research  has  shown  that  using  pattern  recognition  with  the  Fourier  transform 
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can  locate  the  position  of  a  rotated  template  with  the  same  scale  factor  in  the  x  and  y 
directions  from  a  scene^'^^f''''.  However,  their  approaches  still  fail  when  a  template  has  scale 
factors  in  the  x  and  y  directions,  and  scale  factors  have  been  have  been  changed  too  much. 

Furthermore,  it  is  difficult  to  find  the  stereo  correspondences  from  trees  and  bushes. 
The  developed  approach  still  has  difficulty  to  correctly  locate  the  stereo  correspondences 
from  trees  or  bushes  because  many  similar  and  complex  features  exist  in  trees  or  bushes. 

7.3  Recommendations  for  Future  Research 

The  recommendations  for  future  research  drawn  from  this  research  are  described  as 
follows. 

It  is  necessary  to  run  enough  tests  such  that  the  developed  system's  C  factor  can  be 
concluded  in  statistics.  In  future,  images  with  different  scales  or  resolutions  should  be  used 
to  test  the  developed  system  to  find  out  what  kinds  of  accuracy  the  developed  system  can 
reach.  Furthermore,  images  containing  complex  or  simple  textures  are  tested  to  determine 
how  good  the  developed  system  is. 

When  the  shape  and  intensity  distribution  of  the  template  shown  in  the  scene  are 
changed,  the  pattern  recognition  with  Fourier  transform  used  in  this  research  to  locate  the 
positions  of  the  fiducial  marks  cannot  accurately  locate  the  position  of  the  template  from  the 
scene.  Therefore,  a  more  robust  approach  must  be  developed  to  solve  this  problem.  Two 
approaches  are  recommended  for  fiiture  study.  One  is  to  create  a  learning  algorithm  by 
employing  the  concept  of  neural  network.  First,  when  the  learning  algorithm  is  created,  if 
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the  algorithm  meets  a  template  which  does  not  exist  in  the  database  of  the  learning  algorithm, 
the  algorithm  can  automatically  train  itself  to  adapt  the  new  template  and  put  the  template 
into  its  database  such  that  the  created  learning  algorithm  by  applying  neural  network  can 
locate  the  position  of  the  template  to  a  certain  level.  Second,  the  learning  algorithm  can 
locate  the  template's  position  from  a  scene  when  the  template  has  been  rotated  and  its  scale 
factors  in  x  and  y  directions  have  been  changed.  Another  approach  is  to  modify  the 
technique  of  pattern  recognition  with  Fourier  transform  used  in  this  research.  At  least,  let 
the  modified  technique  provide  a  good  approximation  on  the  location  of  the  template  in  the 
scene.  Then,  iterative  least-squares  approach  can  be  used  to  refine  the  location  of  the 
template  to  a  tolerable  level. 

Bushes,  trees,  and  buildings  always  provide  some  kind  of  noise  during  stereo 
matching.  If  the  effects  from  bushes,  trees,  and  buildings  can  be  eliminated  from  a  given 
stereo  pair,  the  digital  terrain  model  created  from  the  stereo  pair  can  provide  a  more  accurate 
description  on  a  specified  ground  surface.  One  approach  is  proposed  to  solve  this  problem 
in  future  study.  The  first  is  to  develop  an  algorithm  such  that  an  operator  can  specify  the 
ranges  for  those  bushes,  trees,  and  buildings,  and  for  those  areas  falling  within  the  specified 
range,  their  boundaries  can  be  automatically  extracted.  After  the  boundaries  are  located,  the 
information  within  the  boundaries  is  eliminated.  Then,  stereo  matching  is  applied  on  these 
processed  images.  The  processing  time  can  be  significantly  reduced  because  the  information 
in  the  given  image  has  been  decreased. 


APPENDIX  A 

LINEARIZATION  OF  THE  COLLINEARITY  EQUATIONS 


The  colinearity  equation  is  written  as 

m3,(X.  -  X,^)  .  m3^(Y.  -  Y^^)  .  0133(2.  -  Z^<^) 

(A-1) 

^m^,(X.  -X,^)  .m^^(Y.  -  Y^^)  ^  m„(Z.  -  Z^^) 
m„(X.  -  X.'=)  +  m„(Y.  -  Y.^)  +  m(Z.  -  Z.^) 

3n     J  1    ^  32'-     J  1  33'-    J  I 

The  nonlinear  equations  are  linearized  using  Taylor's  theorem.  In  linearizing  the  collinearity 
equations,  Equation  A-1  are  re-written  as  follows: 
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where 


X..  ,  y..  :  the  photo  coordinates  of  the  j-th  ground  point  on  the  i-th  photo. 
Xp  ,  yp  :  the  photo  coordinate  of  the  principal  point  of  a  photo. 


Let 


.  r 
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k,  ,      ,  kj  :  the  correction  coefficients  for  Gaussian  symmetric  radial  distortion. 

p,  ,  P2  :  the  correction  coefficients  for  decentering  distortion. 

c        c  c 

Xj   ,  Yj  ,  Zj  :  the  object  space  coordinates  of  the  i-th  exposure  station. 
X.  ,  Yj  ,  Z.  :  the  object  space  coordinates  of  the  j-th  ground  object, 
f :  the  focal  length  of  the  camera. 


aX.,  =  X.  -  X,'^ 


aY..  =  Y.  - 


aZ..  =  Z.  - 
J'        J  ' 


,2        ,4  ,6 

O      =     k,ry       +     kjfij       +  kjFy 


q  =  k,  +  2k,r^  +  3k3r; 


r  =  m3,AX.j  +  mjjAY..  +  itIjjAZ.j 


s  =  m„AX..  +  m.jAY.;  +  01,3  aZ.. 


t  =  m^iAX.;  +  m^^AY..  +  m^jAZj. 


In  Equation  A-2,  each  derived  component  is  described  as  follows: 
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In  Equation  A-3,  each  derived  component  is  written  as  follows. 
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The  matrix  form  is  shown  in  Equation  A-4. 
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For  simplicity,  the  symbolic  representation  of  Equation  A-4  is  shown  as  follows. 

V,.  +  BA  +  B  a   +  B A   =  e  (A-5) 


APPENDIX  B 

USING  THE  FOURIER  TRANSFORM  TO  LOCATE  A  TEMPLATE'S  POSITION 
FROM  A  SCENE  CONTAINING  THE  TEMPLATE 


Let  f(  X,  y )  represent  a  scene  containing  a  template,  t(  x,y ),  and  b(  x,y )  represent 
the  background  effects  on  the  scene.  Then,  f(  x,y )  can  be  represented  as 

f(x,y)  -  t(x,y)  +  b(x,y)  (B-1) 

Taking  the  Fourier  transform  of  f(  x,  y).  Equation  6-2  is  written  as  follows 

F(u,v)  =  T(u,v)  +  B(u,v)  (B-2) 

where  F(  u,  v ),  T(  u,  v ),  and  B(  u,  v )  represent  the  Fourier  transform  of  f(x,y),  t(  x,  y),  and 
b(  X,  y),  respectively.  Let  t^  ( x ,  y )  represent  any  template  function  and  its  Fourier  transform 
is     ( u ,  V ) .  If  Equation  6-3  is  divided  by     ( u ,  v ) ,  its  result  is  represented  as  follows. 

F(u>v)    ^   T(u,v)   ^  B(u,v) 

T^(u.v)      T^(u,v)      T^(u,v)  ^  '^^ 


Ift(x,y)  =  t.(x,y),  Equations  6-4  is  rewritten  as 


F(u,v)  .  ^  B(u,v) 

  =  0(u,v)  +  — i  (^.4) 
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where  Sis  a  delta  function.  In  mathematics,  when  the  inverse  Fourier  transform  is  applied 
on  the  delta  function,  its  value  is  very  large.  This  is  the  reason  the  position  with  the  highest 
DN  after  applying  pattern  recognition  with  the  Fourier  transform  is  chosen  as  the  center 
position  of  the  template. 

A  pre-built  template  is  shown  in  Figure  B.l  ans  a  scene  containing  the  template  is 
shown  in  the  Figure  B.2.  The  center  of  the  template  is  located  at  the  point  with  the  highest 
DN  from  the  scene  by  employing  the  Fourier  transform  approach  mentioned  above.  The 
result  is  shown  in  Figure  B.3. 

, ,  :  However,  when  a  template  shown  in  a  scene  is  rotated,  the  method  fails  to  locate  the 
template's  position  from  the  scene.  Figure  B.4  shows  a  scene  containing  a  rotated  template 
and  Figure  B-5  shows  the  result  when  the  method  was  applied. 
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